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Abstract 

To be practi cal, recognition system mst deal with uncertainty Rsitions of image 
features inscenes wy Features sonatinas fail to appear because of unfavorable illu¬ 
mination. In this w>rk, mthods of statistical inference are combined wtherprical 
mdels of uncertainty in order to emirate andrehne hypotheses about the occurrence 
of a knowi object in a scene. 

Robalilistic mdels are used to characterize image features and their correspon¬ 
dences. A statistical approach is taken for the acquisition of object mdels from 
observations in images: Mean Edge Images are used to capture object features that 
are reasomali y stali e wth respect to mri ati ons i n i 11 unhmati on 

The A i gnrnn t approach to recognition, that has been described by Hutted ocher 
and U11 ran, i s used The mchad sm that are empl qyed to generate i d ti al hypothe¬ 
ses are distinct fromthose that are used to verify (and refine) them In this w>rk, 
posterior probability and MaximmLikelihood are the criteria for evaluating and 
refiring hypotheses. The rec qgn i ti on strategy advocated in this work ray be sum 
rarizedas Align Refine Veri fy, whereby 1 ocal search in pose space is dili zed to refine 
hypotheses fromthe alignment stage before verification is carried out. 

Tw> f orrd ati ons of model - based obj ect recogri ti on are descri bed MP Mdel 
MtcHng evaluates joiri hypotheses of natch and pose, while Rsterior Mrginal 
Rse Btination evaluates the pose oriy local search in pose space is carried out 
wth the Expectation-Mxirhzation (ETvt algorithm 

Recogri ti on experimds are described where the EMalgori thirds used to refine 
and evaluate pose hypotheses in2Dand3D Initial hypotheses for the 2Dexperi rents 
were generated by a simple indexing method Agle Rir Indexing. The linear 
(bntfl nation of View method of III ran and Basri is enjio^d as the projection 
mdel in the 3Dexperiremits. 
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Chapt e r 1 


Int r oduct ion 

Vsual object recognitionis the focus of the research reported in this thesis. ftcogni- 
ti on nast deal wth uncertai nty to be practi cal. Bsi ti ons of i rage features bel o ngi ng 
to objects in scenes wy Eatures sonatinas fail to appear because of unfavorable 
illunhnation In this work, nathods of statistical inference are corMned wtherpr- 
ical mdels of uncertainty in order to emirate hypotheses about the occurrence of a 
knovn object in a scene. Qher problem, such as the generation of initial hypotheses 
and the acquisition of object rodel features are also addressed 

1.1 The Problem 

ftpresentati re recogni ti on proli em and tbei r sol uti ons are i 11 ustrated i n E gures 1-1 
and 1-2. The proliemis to detect and locate the car in digitized video irages, using 
previously avail alie detail ed inf ornati on about the car. Inthese figures, object mdel 
features are superimposed over the video i rages at the positi on and orientation where 
the car ras found Egure 1-1 show the results of 2Drecognition, vhile Egure 1-2 
illustrates the results of 3Drecognition These irages are fromexperinants that are 
described in Chapter 10. Radical solutions to proli em life these wl 1 infrore the 
flexibility of robotic system. 
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CHAPTER 1. INIRCDUCnOSf 



E gure 1-1: Rpresentati re Rc qgn i ti on Roli ernaixl Sol uti on (2E) 



E gure 1- 2: Kpresentati re Rcqgni ti on Roli ernaixl Sol uti on (3E) 





1.2. THE APPROACH 


13 


In this w>rk, the recognition problems restricted to finding occurrences of a single 
object in scenes that nay contain other unknown objects, ftspite the simplification 
and years of research, the proli emrenai ns 1 argel y unsol ved R)bust system that 
can recogni ze smoth obj ects had ng si x degrees of freecbmof posi ti on, under varyi ng 
conditions of i 11 u nnn ati on, occlusion, and background, are not conmrcial 1 y aval 1 alie. 
Meh effort has been expended on this proliemas is evident in the comprehensive 
review of research in computer-based obj ect recognition fy Btsl and Jain [5], who 
cited 203 references, and (hin and Qer [18], who cited 155 references. The goal of 
this thesis is to characterize, as veil as to describe howto find, robust solutions to 
visual object recogni t ion proli em. 


1.2 The .Approach 

In this vork, statistical nathods are used to evaluate and refine hypotheses in object 
rec ogn i ti on. Agle Rir Indexing, a mans of generating hypotheses, is introduced 
These mchanismare used in an extension of the Aignmnt mthodthat includes a 
pose refinemnt step. Ekiiof these components are anplifled below 

1.2.1 Sbctisticcl T^pudi 

Intli s research, vi sual obj ect recogni ti oni s approached vi a the pri nei pi es of Mxi mm 
likelihood (ML) and MxinnmAIbsteriori probability (ME). These principles, 
along wth specific probabilistic rodels of aspects of object recognition, are used to 
derive obj ect ive functions for evaluating and refining recogni ti on hypotheses. The ML 
and MP criteria have a long hi story of successful applicationinfornnlating decisions 
and in mking estimates fromobeerved data. They have attractive properties of 
optimality and are often useful when masuremnt errors are significant. 

In other areas of computer vision, statistics has proven useful as a theoretical 
framvork. The vork of Yuille, (Siger and Blthoff on stereo [78] is one example, 
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vhi 1 e ini nage restorati on the work of Gkan and Gkan [ 28], Mrroqui n [ 54], and 
Mrroquin, Mtter andRggio [55] are others. The statistical approach that is used 
in this thesis concerts the rec qgr ii ti on probl emi nto a, vd 1 defined (althorgh not nec¬ 
essarily easy) optimization problem This has the advantage of providing anexplicit 
characterization of the probl erg while separating it fromthe description of the algo- 
ri thm used to sol ve i t. A1 hoc obj ecti ve f uncti ons have been profitaH y used i n som 
areas of conputer vision Such an approach is used by Brnard in stereo Hatching 
[ 2], B ake and Zi ssernan [7] ini nage restorati on and ftveri cjge, W ss and Rsenan 
[6] inline segmnt based mdel matching. Whthis approach, pi ausi lie form for 
conponents of the objective function are often contaned using trade-offparamters. 

Such trade-off paranaters are deternhned erp ri cal 1 y. Ai advantage of deri vi ng ob- 
jective functions fromstatistical theories is that assumptions becona explicit - the 
form of the obj ecti ve f uncti on conponents are cl earl y rel ated to sped fic probali 1 i sti c 
mdels. If these mdel s fit the domain then there is som assurance that the resulting 
cri teri a wl 1 performed 1. Asecond advantage i s that the trade- off paranaters i n the 
objective function can be derivedfrommasuralie statistics of the domain 

122 FeEture-BaBed Bocgiticn 

This vorkuses a feature- based approach to obj ect recognition Ratures are abstrac¬ 
tions like points or curves that summarize som structure of the patterns in an image. 
There are several reasons for using feature based approaches to object recognition 

• Ratures can concisely represent objects and images. Ratures derived from 
hri ghtness edges can sunned ze the i mportant events of an i rage i n a way that 

is reasonably stable vithrespect to scene illunhnation 

• In the al i gnmnt approach to recogni ti on (to be descri bed sbortl y), hypotheses 
are veri fled by proj ecti ng the obj ect mdel i nto the i rage, then conpari ng the 
predi cti on agai nst the i rage, f^ usi rg compact, feature- based representati ons 
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of the object, proj ecti on costs ray be kept low 

• Katures also facilitate hypothesis generation Indexing nathods are attractive 
mchanism for hypothesis generation Such nathods use tables indexed by 
properties of snail groups of irage features to quickly locate correspondng 
mdel features. 

Object Entires fromOteervation 

Anajor issue that mst be faced in mdel-based object recognition concerns the 
origin of the object mdel itself. The object features that are used in this w>rk are 
derivedfrorractual irage observations. This mthodof feature acquisitionautonat- 
ically favors those features that are lihely to be detected in irages. The potentially 
diffidt proliemof predicting irage features fromahstract geonatric rodels is by¬ 
passed This prediction proliemis ranagealie in som constrained cbrains (wth 
polyhedral objects, for instance) but it is often diffiult, especially wth smoth ob¬ 
jects, lowresolution irages and lighting variations. 

K>r robustness, sinple local irage features are usedintds w>rk Eatures of this 
sort are easily detected in contrast to extended features like line segmnts. Extended 
features have been used in som system for hypothesis generation because their ad 
ditional structure provides rare constraint than that offered by si npl e local features. 
Extended features, nonetheless, have dravbacks in being diffiult to detect due to 
occlusions and localized failures of inage contrast. Ifrause of this, system that rely 
on distinguished features canlose robustness. 

123 Ai^mt 

hypothesize-and test, or aligmmnt nathods have proven effective in visual object 
recognition fitted ocher and III nan [43] used search over nhrinal sets of corre¬ 
sponding features to establish candidate hypotheses. In their vorkthese hypotheses, 
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or a 1 i gnments, are tested by projecting the object mdel into the inage rising the 
pose (posi ti on and ori entati on) i npli ed by the hypotbesi s, and then by perfor nhng a 
detailed comparison wth the inage. The basic strategy of the alignnant mtbodis 
to use separate mcbanism for generating and testing hypotheses. 

Kcently, indexing nathods hare becom arailalie for effiiently generating hy¬ 
potheses i n recogni ti on. These nathods avoi d a si gni hcant arnunt of search by usi ng 
pre-computed tables for looking up the object features that light correspond to a 
group of inage features. The geonatric hashing mthodof landanand Wfson [49] 
uses invariant properties of snail groups of features under affie transformations as 
the look-up key Genans and Jacobs [19] [20], and Jacobs [45] described indexing 
nathods that gai n effii eney by usi ng a feature grouping process to select snail sets 
of inage features that are likely to belong to one object in the scene. 

Intlis w>rk, asinpleforirof 2Dindexing, Angl e Pai r Indexi ng, is usedtogenerate 
initial hypotheses. It uses an invariant property of pairs of inage features under 
translation, rotation and scale. This is described in Giapter 9. 

The hbugh transform[40] [44] is another connonly used nathod for generating 
hypotheses in object rec ogn i ti on. In the hbugh nathod, feature-based clustering is 
perfornad in pose space, the space of the transformations describing the possible 
nation of the object. This nathod was used by Gimon and Iozano-Krez [36] to 
1 ocal i ze the search i n recogni ti on 

Thesefast nathods of hypothesis generation provide ongoing reasons for usingthe 
alignnant approach They are often most effective when used in conjunction wth 
verification NAibcation is important because indexing nathods can be susceptible 
to table collisions, while hbugh nathods sonatinas generate false positives due to 
thei r aggregati on of i neonsi stent evi denee i n pose space li ns. Thi s 1 ast poi nt has been 
argued by Gi mon and hbttenl ocher [ 35]. 

The usual alignnant strategy ray be surmri zed as align verify. Aignnant and 
verification place differing pressures on the choice of features for rec ogn i ti on. Mch- 
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ari sm used for generati ng hypotheses typi cal 1 y bare conputati onal coi|i exi ty that 
is polynomal intbe nunher of features inrelred Rcause of this, there is significant 
advantage to using lowresolution features - there are fever of them Ihfortunately, 
pose esti nates based on coarse features tend to be less accurate than those based on 
high resolution features. 

Iikewse, verification is usually rare reliable wth high resolution features. This 
approach yi el ds rare detailedconparisons. These differing pressures ray be accom 
mdated by enploying coarse-fine approaches. The coarse-fine strategy was utilized 
successfully in stereo by (Jimon [33]. In the coarse-fine strategy, hypotheses de¬ 
rived froniowresoluti on features limt the searchfor hypotheses deriredfromhigh- 
resolution features. There are sona potential diffiulties that arise when applying 
coarse-fine mthods in conjunction wth 3Dobject mdels. These ray be avoided 
by using view based alternatives to 3Dobject mdeling. These issues are discussed 
rare ful 1 y i n Chapter 4. 


Aign Rfine Verify 

The recogni ti on strategy advocated i n tbi s w>rk my be sunmri zed as al i gn refine 
verify. Tbi s approach has been used by Ii pson [ 50] in refining alignmnts. The key 
observation is that local search in pose space my be used to refine the hypothesis 
fromtbe alignmnt stage before verification is carried out. In hypothesize and test 
mthods, the pose estimtes of the initial hypotheses tend to be somwhat inaccurate, 
since they are based on rsm ml sets of corresponding features. litter pose estimtes 
(hence, better vorificat ions) are li My to result frormsing all supporting imge feature 
data, rather tbanasmll subset. Chapter 8 describes amtbodtbat refines the pose 
estimte while sinnltaneously identifying and incorporating the constraints of all 
supporting imge features. 
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1.3 Gui de to Thesi s 

Biefjy the presentation of the naterial in this thesis is essentially bottomup. The 
early chapiters are concerned wth building the conponents of the fornnlation, while 
the min contributions, the statistical formlations of object recognition, are de¬ 
scribed in Chapters 6 and 7. Ater that, related algorithm are described, followd 
by experi nsnt s and conel usi ons. 

In mre detail, Chapter 2 describes the probabilistic mdels of the correspon¬ 
dences, or napping betveenimge features and features belonging to either the ob 
j ect or to the background These rodel s use the pri nci pi e of mxi nnmentropy where 
little informtionis available before the imge is observed In Chapter 3, probabilis¬ 
tic mdels are developed that characterize the feature detection process. Rprical 
evi denee i s descri bed to support the choi ce of mdel. 

Chapter 4 discusses a way of obtaining average object edge features froma se¬ 
quence of observati ons of the obj ect ini nages. ftternhni sti c rodel s of the proj ecti on 
of features into the imge are discussed in Chapter 5. The proj ecti on nat hods used 
intlis vorkare linear inthe paranaters of the transfornations. Mthods for 2Dand 
3Dare discussed, including the linear (biia nation of Vews mthodof III nan and 
Rsri [ 71]. 

In Chapter 6 the above mdels are conbnedina ftyesianframworkto construct 
a criterion, Ml P Mdel Mtching , for evaluating hypotheses in obj ect recognition 
Intlis formlation, conplete hypotheses consist of a description of the correspon¬ 
dences betveenimge and obj ect features, as veil as the pose of the object. These 
hypotheses are evaluated by their posterior (after the imge is observed) probability 
Arecogii ti on experi mnt i s descri bed that uses the cri teri a to gui de a heuri sti c search 
over correspondences. Aconnect ion bet wen MP Mdel Mtching and a mthodof 
robust chanter mtching [47] is described 

Bilding on the abow, a second criterion is described in Chapter 7: Posteri or 
Marginal Pose Estirmtion (I Mil). lire, the solution being sought is sinply the 
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pose of the object. The posterior probability of poses is obtained by taking the 
formal marginal, oh all possible matches, of the posterior probali 1 i ty of the joint 
hypotheses of MPMdel Mtcbirg. This results in a smooth, non-linear objective 
function for emlrating poses. The smoothness of the objective function facilitates 
local search in pose space as a nachanismfor refining hypotheses in recognition. 
Som experi rental explorati ons of the obj ecti ve functi on i npose space are descri bed 
These characteri zati ons are carri ed out i n t w> cbnai ns: vi deo i nagery and syntbeti c 
radar range i nagery 

Chapter 8 descri bes use of the the Expect at i on- Mxi ni zat i on (Mjl al gori tbm[ 21] 
for finding local maxima of the ME obj ective function. This algorithmalternates 
bet wen the Mstep - a weighted least squares pose estimate, and the Estep - re¬ 
calculation of the wights based on a saturating non-linear function of the residuals. 

Thi s al gori thmi s used to refine and e’val uate poses i n 2Dand 3Drecogni ti on ex¬ 
peri r an ts that are descri bed i n Chapter 10. Ini ti al hypotheses for the 2Dexperi r an ts 
wre generated by a simple indexing method, Angl e Pat r Indexi ng } that is described 
in Chapter 9 . The linear GkM nation of Vew method of III ran and Rsri [71] is 
enplo^das the proj ecti on model in the 3Dexperi rents reported there. 

Enally, som conclusions are drawi in Chapter 11. The notation used throrghout 
i s sunnari zed i n ^apendi x A 
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Chapter 2 


Modeli ng Feature Cbrrespondence 


This chapter is concernedwthprobalilistic mdels of feature correspondences. These 
mdels wll ser’ve as priors in the statistical theories of object recognition that are 
described in Chapters 6 and 7, and are important components of those formulations. 

They are used to assess the probability that features correspond before the image data 
is compared to the object model. They capture the expectation that sona features 
i n an i rage are anti ci pated to be due to the obj ect 

Three different mdels of feature correspondence are described, one of -which is 
used i n the recogni ti on experi ran ts descri bed i n Chapters 6, 7, and 10. 


2.1 Feat ires and Correspondences 

This research focuses on feature-based obj ect rec ogn i ti on. The object being sought 
and the image being analyzed consist of discrete features. 

let the image that is to be analyzed be represented by a set of v -diransional 
point features 

Y = {Y u Y 2 ,...,Y n } , Y t eR v . 

Image features are discussed in rare detail in Chapters 3 and 5. 
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The obj ect to be recognized i s also described fy a set of features, 

M ! '/ ,..w, . \l ) . 

The features wll usual ly be represented fy real natrices. AHtional details on obj ect 
features appears in Chapters 4 and 5. 

In this work, the interpretation of the features in an inage is represented by the 
'variable T, which describes the rappingfroni rage features to obj ect features or the 
scene background This is also referred to as the correspondences. 

r={r 1 ,r 2 ,, T t ew{±} . 

In an interpretation, each i rage feature, Y wll be assigned either to sona obj ect 

feature M or to the background, which is denoted by the synhol _L This s;ynhol 
plays a rol e si nhl ar to that of the null character inthe interpretation trees of Gimon 
and Iozano- Krez [ 36]. Ai i nterpretati oni s i 11 ustrated i n E gure 2-1. Id s a col 1 ecti on 
of variables that is indexed in parallel wth the irage features. Sch varialie T 
represents the assignment of the corresponding irage feature Y n It ray tale on as 

value any of the object features M or the background, _L Thus, the naaning of the 
expression ]! 1 5 =M 6 is that irage feature five is assigned to obj ect feature six, likewse 
r 7 =Lmans that irage feature seven has been assigned to the background In an 
interpret at ion each irage feature is assigned, while sona object features ray not be. 

A3di ti onal 1 y, several i rage features ray be assi gned to the sana obj ect feature. Thi s 
representation allow irage interpretations that are implausible - other nachanism 
are used to encourage natri cal consi steney. 
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2.2 Ai Indeperdert Correspondence Model 

In this section a simple probabilistic mdel of correspondences is described The 
i ntent i s to capture som i nfornati on beari ng on correspondences before the i nage i s 
compared to the obj ect. Thi s rodel has been desi gned to be a reasonali e corponhse 
bet veen si npl i ci ty and accuracy. 

In this mdel, the correspondence status of differing inage features are assured 
to be independent, so that 

pffl= II/< r .) • 

i 

fibre, p(I) is a probability mss function on the discrete variable V. There is 
evidence against using statistical independence here, for example, occlusionis locally 
correl ated Independence i s used as an e ngi neeri ng approxi rati on that si npl i ffes the 
resulting formlations of rec pgn i tion. It my be justified by the good performance 
of the recognition experimnts described in Chapters 6, 7, and 10. Kwreccgnition 
system have used non-independent models of correspondence. I lend outlined one 
approachinhis thesis [9]. Arel axation of this assunptionis discussed in the followng 
section 

The component probability function is designed to characterize the amount of 
clutter in the image, but to be otbervise as non-committal as possible: 

B if C =i_ 

—- otberwse 

V m 

The j oint mdel p(I) is the mximmentropy probability function that is con¬ 
sistent wththe constraint that the probability of an image feature belonging to the 
background is B ilbaybe estimtedby taking si nple statistics on images fromtbe 
domain B =.9 w>uld man that 90 % of imge features are expected to be due to 
the background 

living inconstant during rec pgn i ti on is an approximtion The nurher of fea- 
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tures due to the obj ect vil 1 1 i kel y vary accord ng to the si ze of the obj ect i n the scene. 
iAoul d be esti rated at recogni ti on ti m hy pre- processi ng nachani sna that eval uate 
inage clutter, and factor in expectations about the size of the object. In practice, 
the approxi nation works veil in controlled situations. 

The independent correspondence mdel is used in the experimnts reported in 
this research 


2.3 AMrkov Gorespordence Mdel 

A indicated above, one inaccuracy of the independent correspondence nadel is that 
sample realizations of T drawi fromthe probability function of Equations 2.1 and 
2.2 wll tend to be overly fragmented in their mdeling of occlusion. This section 
describes a confronhse nadel that relaxes the independence assumption somvhat 
byallowngthe correspondence status of an inage feature (T to depend on that of 

its neighbors. In the domain of this research, inage features are fragments of inage 
edge curves. These features have a natural neighbor relation, adjacency along the 
inage edge curve, that ray be used for constructing a lDMrkov RmcbmEeld 
i MH i mdel of correspondences. NIFs are collections of randomvarialies vhose 
condi ti onal dependence i s restri cted to 1 i noted si ze nei ghborhoods. NHf mdel s are 
discussed by ©nan and ©ran [28]. The folloving describes an NHf mdel of 
correspondences intended to provide a rare accurate mdel of occlusion 


iV)=q(T 1 )iT 2 )...£T n )r 1 (T 1 ,T 2 )r 2 (T 2 ,T 3 )...r n - 1 (T n - U T n ) , (2.3) 


fir.) = 


where 


< 


ei if I) =h_ 
e 2 otherwse 


(2.4) 
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and 


ri(a, b) 


e 3 if a=Land&=L 
e 4 if a^Landfc /=d_ 
e 5 otherwse 
f 


> if features i andz + f are neighbors 


otherwse . 

( 2 . 5 ) 


The assignmnt of indices to inage features should be done in such a my that 
neighboring features bam adjacent indices. The functions r 4 -(-, • ) rndel the interac¬ 
tion of neighboring features. The paramters e i . . . nay be adjusted so that the 

probability function p(I) is consistent wth observed statistics on clutter and fre¬ 
quency of adjacent occlusions. AMtional 1 y, the parameters rust be constrained so 
that Rjuati on 2.3 actual 1 y descri bes a probali 1 i ty functi on. MMl these constrai nts 
arenat, the mdel wll be the i^nnmentropy probability function consistent wth 
the constraints. Satisfying the constraints is anon-trivial selectionproliemtbat my 
be approached i terati ml y Brtunatel y, tbi s cal cul ati on cbesn’t need to be carri ed out 
at recogni ti on ti m. (SI dmn [ 30] di scusses mtbods of cal cul ati ng these paramters. 

The mdel outlined in Equations 2.3 - 2.5 is a generalization of the Ising spin 
mdel. Ising mdels are used in statistical physics to mdel ferronagnetism[73]. 

Samples drawifromlsing mdels exhibit spatial clurpng whose scale depends on 
the paramters. In object rec ogn i ti on, this clurpng behavior my provide a rare 
accurate mdel of occlusion. 

The standard Ising mdel is sbowifor reference inthe foilowng equations. It has 
been restricted to IE) and has been adapted to the notation of this section. 


Oi £{ — 1 , 1 } 


p(<7icr 2 • • • (Jn) = -^4 cr l)4 cr 2) ■ ■ ■ (£a) Cr 2)K Cr 2, 0 - 3 ) •• • Kff- 1> a n) 
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4 a) 


exp( if a= 1 

exp(— otherwse 


exp( if a=b 

exp(— otherwse . 

Bre, Z is a normalization constant, fi is the mrant of the nagnetic dipoles, 

H is the strength of the applied nagnetic held, A; is Bbltznam’s constant, T is 
temperature, and J is a neighbor interaction constant called the exchange energy. 

The approach to mdeling correspondences that is described in this section vas 
outlined in Wls [74] [75]. Subsequently, Betel [9] described a sinhlar local interac- 
tionmdel of occlusioninconjunetionwthasirplifiedstatistical rodel of rec qgn i tion 
that used bool ean features i n a cl assi hcati on based scbena. 

The Mrkov correspondence rodel is not used in the experimnts reported in this 
research 


riqb) = < 


2.4 Incorporati rg Sal i erry 


Aether route to rare accurate radeling of correspondences is to exploit bottomup 
saliency processes to suggest -which imge features are mst likely to correspond to 
the obj ect. Qe such process i n descri bed by III ran and Sbasbua [ 66]. 

fbr concreteness, assure that the saliency process provide a per-feature masure 
of saliency, Si- T>incorporate this information, reconstruct p(T =1] S ,•)• This ray 

be conveniently calculated via Byes’ rule as follow: 


5,-) 


| h j =i) 

is t ) 


4S i | h =d) and 4S ;) are probability densities that ray be estimated from 
observed frequencies in training data. A in Section 2.2, ve set p(T i =L) =B 
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Afeature specific baclground probability nay then be defined as follow: 

B, = A r,=i| g,) = ^ gl ] < - r ‘, =J) B. 

rE i) 

In this case the complete probability function on T wll be 

ip if i; =± 

1 ~ Bl otherwse 

m 

This nadel is not used in the experiments described in this research 

2.5 Gbrrl isi oib 

The simplest of the three models described, the independent correspondence mdel, 
has been used to good effect i n the rec pgn i ti on experi r un ts described in Chapters 6, 7, 
and 10. In som domains additional robustness in recogni ti on light result frorrusing 
ei ther the Mrhov correspondence model, or by i neorporati ng sal i eney i nfornati on 





Chapter 3 


M)del i ng Image Features 


Kobalilistic mdels of inage features are the topic of this chapiter. These are an¬ 
other inportant component of the statistical theories of object recognition that are 
descri bed i n Chapters 6 and 7. 

The probability density funetionf or the coordinates of i rage features, conditioned 
on correspondences and pose, is defined The HFhas tw> inportant cases, depend¬ 
ing on whether the inage feature is assigned to the object, or to the background 
Katures natcbed to the object are modeled wth norml densities, while uniform 
densities are usedfor background features. Efprical evidence is providedto support 
the use of norml densities for natchedfeatures. Aformof stationarityis described 

Mny recognition system implicitly use uniformdensities (rather than norml 
densities) to mdel natched image features (bounded error mdels). The erprical 
evidence of Section 3.2.1 indicates that the norml mdel nay sonatinas be better. 
Because of this, use of norml mdels my provide better performance in recognition. 


29 



30 


CHAPTER3. MUELINGIMGE FEATURES 


3.1 AUriformMdel for Backgrouixl Entires 

The inage features, Y y are v di ron si onal vectors. Mabn assigned to the background, 
they are assured to be umfornhy distributed, 

ifr -= X ' («) 

(The H) i s defined to be zero outside the coordinate space of the inage features, 
which has extent W ; along dimnsionz.) T describes the correspondences froninage 
features to object features, and f3 describes the position and orientation, or pose of 
the object. Br example, if the image features are 2Dpoints in a 640 by 480 inage, 
then ]iY | _4 /$ = 640 * 480 , wthi n the i rage. Br Y this probability function depends 
only on the z’ th component of T. 

Kovidirgasatisfyirgprobalilitydensityfunetionfor bachgroundfeatures is prob¬ 
lematical. Equation 3.1 describes the naximmentropy 41b' consistent wth the 
constraint that the coordinates of inage features are always expected to lie wtbin 
the coordinate space of the inage features. ET Jaynes [46] has argued that naxi- 
mrnentropy distributions are the most honest representation of a state of incomplete 
knowledge. 


3.2 ANorml Mdel for Mfcched Eat ires 

Image features that are matched to object features are assured to be normally dis¬ 
tributed about their predicted position in the inage, 

iY t I r,/$ =G ^(Y if r t - =Mj . (3.2) 

Hre Y T, and (3 are defined as above. 

is the ndimensional (Russian probability density function wth covariance 
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Egure 3-1: Ene Inage Batures anclEne Mdel E?atures 


mtrix '0 ij, 

G 4 , tj (x) =(2 tt) ~%\iij- 1 ? ejq>(- . 

The cowi aixe mtri x 0 t j i s cl scussed rare ful 1 y i n Secti on 3.3. 

Mbn T i = M j, tlx predicted coord nates of inage feature V aix given by 

Fi IF. L tlx proj ecti on of object feature j into tlx inage wth obj ect pose /J Ko- 
j ecti on and pose are cl scussed i n rare cletai 1 i n Chapter 5. 

321 Erjiriccl Brickie fa* the Nani Mil 

His section describes som erprical evidence fromtlx cbmnof video inage edge 
features iixleating that nornal probability cbnsities are goodracbls of feature fluc¬ 
tuations, aixl that tlxy can lx better than uni for improbability cbnsities. TTx ev¬ 
idence is provided in tlx formof observed and fitted cumlati’ve distributions and 
Ibl mgprov- Snhrnov tests. Tlx racbl cl stri buti ons vere fitted to tlx data usi ng tlx 
Mxi mmB kel i hood rat bod 

TTx data that is analyzed are tlx perpendicular and parallel deviations of fix 
and coarse edge features derived fronivicbo inages. TTx fix and coarse features are 
sbowi i n E gures 3-1 and 3- 3 respecti vel y 

TTx mdel features are fromMan Rlge Inages, tlxse are described in Section 
4.4. lx eefee operator usedinobtaiiingtlxinagefeatures is ridges in the nagiitude 
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of the i nage grad ent, as d scussed i n Secti on 4.4. The sroothi ng standard devi ati on 
used i n the edge detecti on was 2.0 and 4.0 pi xel s respecti vel y, for the fine and coarse 
features. These features were also used in the experinsnts reported in Section 40. 4, 
and the correspondences were used there as training data. 

K>r the analysis in this section, the feature data consists of the average of the 
x and y coord nates of the pixels fromecjge curve frag ran t s - they are 2D point 
features. The features are dspla^das circular arc fragmnts for clarity The edge 
curves were broken arbitrarily into 10 and 20 pixel frag ran t,s for the fine and coarse 
features respectively 

Gknespondences frominage features to mdel features were established by a 
neutral subject using a muse. These correspondences are indcated by heavy lines 
in Egures 3-2 and 3-4. Rrpendcular and parallel deviations of the correspond ng 
features were calculated vith respect to the nornals to edge curves at the inage 
features. 

Egure 3-5 show the emulative dstributions of the perpendcular and parallel 
deviations of the fine features. The emulative dstributions of fittednornal densities 
areplottedas heavy dots over the obeerveddstributions. The dstributions vere fitted 
to the data usi ng the Mxi immE keli hood rathod - the man and vari anee of the 
nornal densi ty are set to the man and vari anee of the data These figures showgood 
agreerant betveen the observed dstributions, and the fitted nor ml dstributions. 
Snhlar observed and fitted dstributions for the coarse deviations are shovninEgure 
3-6, again wth good agreemnt. 

The observed emulative dstributions are shovn again in Egures 3-7 and 3-8, 
this tira wth the emulative dstributions of fitted uniformdensities over-plotted 
in heavy dots. A before, the uni formdensi ties vere fitted to the data using the 
MdmrnEMihoodmthod-inthis case the uni formdensi ties are adj ustedto j ust 
include the extrera data These figures showrelati vely poor agreemnt betveen the 
observed and fitted dstributions, incorparisonto norral densities. 

















1 Tinf ormGirnl ati \e Qs- 
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Bviate 

N 

Mrnal IJpothesis 

lin f ormljpothesi s 

D 0 

P{D> D 0 ) 

D 0 

KD>D 0 ) 

Ene Rrpendi cul ar 

118 

.0824 

.3996 

.2244 

.000014 

Ene Rrallel 

118 

.0771 

.4845 

.1596 

.0049 

(barse Iferpendi cul ar 

28 

.1526 

.5317 

.2518 

.0574 

(barse Rrallel 

28 

.0948 

.9628 

.1543 

.5172 


Eli e 3.1: K)1 rqgorov- Snhrnov Est s 


Kolrogprov-Snhrnov Tests 


The K)1 ixgorov-Siirrov (IS) test [59] is one my of analyzing the agreemnt be- 
tvron observed and fitted cumnlati’ve distributions, such as the ones inEgures 3-5 
to 3-8. The IS test is computed on the magnitude of the largest difference between 
the observed and hypothesized (fitted) distributions. This wll be referred to as D 
The probali 1 i ty cli stri buti on on tbi s di stance, under the hypothesi s that the data were 
dravnfromtbe hypotbesi zed di stri buti on, can be cal cul ated Ai asymptoti c form! a 
is given by 

I(I>D 0 ) =Q{ VnD 0 ) 

where 

OO 

0 $ =2 XlH) J_1 esp(-2.7 V) , 

3 = 1 

and D o is the observed Wue of D 

The results of IS tests of the consistency of the data wth fitted normal and 
uni formdi stri buti ons are shown in Table 3.1. Iowvalues of I{D> D 0 ) suggest 

incompatibility between the data and the hypothesized distribution In the cases 
of fine perpendicular and parallel deviations, and coarse perpendicular deviations, 
refutation of the umforinnodel is strongly indicated Strong contradictions of the 
fitted normal mdels are not indicated in any of the cases. 
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3.3 Gierted Stationary Statistics 

The covariance natrix ij that appears in the rndel of natched inage features in 

Rjuati on 3.2 i s al 1 oved to depend on both the i rage feature and the obj ect feature 
involvedinthe correspondence. Indexing onz allow dependence onthe inagefeature 
detecti on process, whi 1 e i ndexi ng i n j al 1 ow dependence on the i denti ty of the rndel 
feature. This is useful wbensona rndel features are knowto be noisier than others. 

This flexibility is carri ed thro ugh the fornal i srrof later chapters. Athoughsuchflex- 
ilility can be useful, substantial sinplification results by assuring that the features 
statistics are stationary in the inage, i.e. tfj i j for all ij. This coul d be reason¬ 
able if the feature fluctuations vere isotropic in the inage, for exanple. In its strict 
formthis assurption ray be too liriting, hovever. This section outlines a corpx> 
rise approach, oriented stationary statistics, that was used in the inpl emit at ions 
described in Chapters 6, 7, and 8. 

This mthod involves attaching a coordinate systemto each irage feature. The 
coordinate systembas its origin at the point location of the feature, and is oriented 
wth respect to the direction of the underlying curve at the feature point. Mabn 
(stationary) statistics on feature deviations are masured, they are talen relative to 
these coordinate system. 


331 Etirrtiig the Parries 

The experimnts reported in Sections 6.2, 7.1, and Chapter 10 use the norml mdel 
and oriented stationary statistics for natched inage features. Ater this choice of 
mdel, it is still necessary to supply the specific paramters for the mdel, namly, 
the covariance natrices, of the norml densities. 

The paramters vere esti rated fromobservations on natches done by band on 
sanple inages fromthe cbnain Efcause of the stationarity assunptionit is possible 
to esti nate the conoosn covari ance, by obeervi ng natch data on one i nage. E>r 
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this purpose, a natch was done wthamuse bet veen features fromaManKjge Im 
age (these are descri bed i n Secti on 4.4) and a representati re i nage fromthe cbnai n 
Bring this process, the pose of the object was the sam in the tw> inages. This 
produced a set of corresponding edge features. Ear the sake of exarple, the process 
wl 1 be descri bed for 2Dpoi nt features (descri bed i n Secti on 5.2). The procedure has 
also been used wth2Dpoint-radius features and2Doriented-range features, that are 
descri bed i n Secti ons 5.3 and 5.4 respecti vel y 

let the observed i nage features be descri bed fy Y and the corresponding naan 

mdel features fy Y^. The observedresiduals betreentbe “data” inage features, and 
the “man” features are A =Y i — Yi- 

The features are derived fromecjge data, and the underlying edge curve has an 
orientation angle in the inage. These angles are used to define coordinate system 
specific to each inage feature Y These coordinate system define rotation natrices 

Ri that are usedtotransfornthe residuals into the coordinate system of the features, 
in the foilowng way: A ' =R ,A,-. 

The stationary covariance natrix of the natched feature fluctuations observed 
in the feature coordinate system is then estirated using the Mxiimmlikelihood 
mthod, as follow, 


<p= 

n 


/ T 


% 

Hre T denotes the natrix transpose operation. This technique has som lias, but 
for the reasonaliy 1 arge sarple sizes involved (nss 100) the effect is nhnor. 

The resul ti ng covari anee natri ces typi cal 1 y i ndi cate 1 arger wi anee for devi ati ons 


along the edge curve than perpendicular to it, as suggested by the data in Egures 
3- 5 and 3- 6. 



42 


CHAPTER3. MUELINGIMGE FEATURES 


332 Speddizirg the Caaiaie 

A rec qgn i ti on ti m, it is necessary to specialize the constant cowiance to each inage 
feature. This is cbne by rotating it to orient it wth respect to the inage feature. 
Acowianee natrix transform like the followng product of residuals: 

A'A' T . 

I l 

This is transformd back to the inage systemas follow, 

Rf A- A- % . 

Thus the constant cowi ance i s speci al i zed to the i rage features i n the f ol 1 owng my, 

t-A j R i ^JR i • 



Chapter 4 


M)del i ng Object s 


Maht is needed fromobject mdels? Br recognition, the min issue lies in predicting 
the imge features that wll appear inanimgeof the object. Should the object mdel 
be a mn ol i thi c 3Ddata structure? Ater all, the object itself is 3D In this chapter, 
som pros and cons of mn ol i thi c 3Dmdels are outlined Ai alternative approach, 
interpolation of view, is proposed The related proliemof obtaining the object 
mdel data is discussed, and it is proposed that the object mdel data be obtained 
by taki ng pi ctures of the obj ect. Ai automti c mtbod for thi s purpose i s descri bed 
Alditionally, a mans of edge detection that captures the average edges of an obj ect 
is described 


4.1 MrolitKc 3D Object Mdels 

Qe mtimtionfor using 3Dobject mdels in recognition system is the observation 
that conputer graphics techniques can be used to synthesize convi nci ng i mges from 
3Dmdel s i n any pose desi red 

Br som obj ect s, bavi ng a si ngl e 3Dmdel seemanatural cboicefor arecognition 
system If the object is polygonal, andis representedbyalist of 3Dline segmnts and 
vertices, then predicting the features that wll appear in a givenbigh resolution view 
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is a simple ratter. A1 that is needed is to apply a pose dependent transformation to 
eachfeature, and to performa visilility test. 

K>r other objects, such as snootily curved obj ects, the situationis different. Be- 
di cti ng features beconas more el aborate. In vi deo i nagery, occl udi ng edges (or l i mbs) 
are often important features. Calculating the linbof a smooth 3Dsurface is usually 
complicated. Rrnce and Kiegnan [58] describe an approach for objects mdeled 
by paramtric surface patches. Agefraic eliiimation theory is used to relate image 
linhs to the model surfaces that generated them Rooks’ vision systerp Anonym 
[ 10], al so recogni zed curved obj ect s fromi nage 1 i nhs. It used general i zed cyl i nders 
to model objects. Adrawbackof this approachis that it is awkmrdto realistically 
modeling typical objects, like telephones or automobiles, vithgeneralized cylinders. 

Redi cti ng reduced resol uti on i mge features i s another di fEul ty vith nosnol i til c 
3Dnodels. This is a drawback because doing recognition vith reduced resolution 
features is an attractive strategy: vith fever features less search vill be needed. Qe 
solution would be to devise a my of smoothing 3Dobject models such that simple 
proj ection operations woul d accuratel y predi ct reduced resolution edge features. N> 
such mthod i s knovn to the author. 

Rtecting reduced resolution image features is straightforward. God ec^e fea¬ 
tures of this sort my be obtained by smoothing the grayscale image before using an 
edge operator. This mthod is co nrunm ly used vith the Gknny edge operator [13], 
and vith the Mrr- HI dreth operator [ 53]. 

Ay al ternati ve approach i s to cb proj ect ions of the obj ect model at full resolution, 
and then to cb som kind of smoothing of the image. It isn’t clear what sort of 
snoothi ng voul d be needed. Gbe possi lilityis tocb pbotonatri cal 1 y real i sti c proj ec¬ 
ti ons (for example by ray tracing rendering), performsmothing in the image, and 
then use the same feature detection schema as is used on the images presented for 
recogni ti on Thi s mthod is 1 i M y to be too expensi ve f or practi cal recogni ti on system 
that need to perform!arge amounts of prediction Rrhaps better ways of doing this 
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wll be found 

Self occlusion is an additional complexity of the mnolitlic 3Dmdel approach. 
In computer graphics there are several mys of dealing wth this issue, anong them 
hidden line and z-buffer mtbods. These mthods are fairly expensive, at least in 
comparison to sparse point projections. 

In summary, mnolithic 3Dobject mdels address som of the requiremnts for 
predicting irages for recognition, but the computational cost ray be high 


4.2 Irterpol ati on of Vi ew 

Qe approach to ami di ng the di ffiul ti es di scussed i n the previ ous secti on i s to use an 
i rage- based approach to obj ect modeling. Ill ran and Rsri [71] bare di scussed such 
approaches. There i s som li ol ogi cal evi denee that ani ml vi si on system hare recog¬ 
nition subsystem that are attuned to specific view of faces [25]. This my provide 
som assurance that imge-based approaches to recognition aren’t unreasonable. 

Ai important issue wth imge-based object mdeling concerns howto predict 
imge features in a way that covers the space of poses that the object my assure. 

R>dies undergoing rigid nation in space hare six degrees of freecbng three in 
translation, and three in rot at ion This sixparamter pose space mybe split intotw) 
parts-the first part bei ngtranslati on and i n i mge-pi ane rotations (four parameters) 

- the second part being out of imge-plane rotations (tw> paramters: the “view 
sphere”). 

Synthesizing view of an obj ect that span the first part of pose space can often 
be done using simple andeffiient linear mthods of translation, rotation, and scale 
i n the pi ane. Tbi s approach can be preci se under ortbqgrapbi c proj ecti on wth seal - 
i ng, and accurate enough i n som cbmi ns wth perspecti ve proj ecti on Rrspecti re 
projection is often approximated in recognition system by 3D rotation confined 
wth orthographic projection and scaling. This has been called the weak perspective 
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approxi nati on [ 70]. 

The second part of pose space, out of plane rotation, is mre conpiicated The 
approach advocated i n thi s research invol ves tessel ati ng the vi ewsphere around the 
object, and storing a view of the object for each vertex of the tessel ati on. Alitrary 
view wll then entail, at mst, snail out of plane rotations fromstored view. These 
view nay be synthesized using interpolation. The linear Gkii nation of Vew 
mthodof III nan and Rsri [71], works veil for interpolating bet veen near by view 
(andmre distant ones, as veil). 

(bneeptually, the interpolation of dew mthod caches pre-confuted predict ions 
of irages, saving the expense of repeatedly conputing themduring recognition. If 
the tessel ati on is dense eno ugh , diffiulties oving to large changes in aspect ray be 
avoided. 

Betel [9] advocates a view based approach to mdeling, vithout interpolation. 


4.3 Object Mdels fromOteervation 

hbwcan object mdel features be acquired for use in the interpolation of view 
framvork? If a detailed GDrodel of the object is availalie, then view light be 
synthesized using graphical renderi ng prograna (this approach vas usedinthe (single 
vi ew) 1 aser radar experi ri an t descri bed i n Secti on 7.3). 

Aether mthodis to use the object itself as its ova mdel, and to acquire view 
by taking pictures of the object. This process can rake use of the feature extraction 
mthod that is used on irages at recognition tim. At advantage of this schemis 
that an accurate GDstyl e mdel i si t needed. IJi ng the run- ti m feature extracti on 
mchanismof the recogni ti on systemautonati cally selects the features that vill be 
salient at recognition tim, which is othervise a potentially diffiult problem 

Qe diffiulty vith the mdels fromobeervation approach is that inage features 
tendto be somvbat unstali e. Rr exanpl e, the presence andl ocati on of edge features 
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is influeneedby ill unhmat ion condi t ions, as illustrated in the followng fgures. Egure 
4-1 show a series of nine grayscale images where the only variation is in lighting. A 
correspond ng set of edge i nages i s showni n4- 2. The edge operator usedi n prepari ng 
the images is described in Section 4.4. The standard deviation of the smoothing 
operator ras 2 pixels. 


4.4 Man Ecjge I rages 

It was pointed out abow that the instability of edge features is a potential dffhity 
of acquiring object model features frornobservation. The ManKjge Image nathod 
solves this proli emby raid ng edge raps that are averaged oh variations due to 
i 11 unhnati on changes. 

Bightness edges ray be characterized as the ridges of a measure of brightness 
variation. This is consistent viththe conrm notion that edges are the lDloci of 
raxinaof changes inbrightness. The edge operator usedinEgure 4-2 is an example 
of this style of edge detector. It is a ridge operator applied to the squared dscrete 
gradent of smoothed images. Hre, the squared dscrete gradent is the measure of 
brightness variation. This style of edge detection was described by Mrcer [57]. The 
mathematical definition of the ridge predcate is that the gradent is perpend cular to 
the d recti on havi ng the most negati ve second d recti onal deri vatiw. Aether si lil ar 
defini ti on of edges was proposed Hrali ck [ 37]. K>r a general survey of edge detecti on 
methods, see Robot Vision , by Urn [39]. 

The preced ng characteri zati on of i mage edges general i zes natural 1 y to man edges. 

Man edges are defined to be ridges in the average measure of brightness fluctuation. 

In this w>rk, average brightness fluctuation oh a set of pictures is obtained by 
averaging the squared dscrete gradent of the (smoothed) images. 

Egure 4-3 show the awragedsquaredgradent of smoothedversions of the i mages 
that appear i n E gure 4-1. Ifecal 1 that onl y the 1 i ghti ng changedbetweenthese i rages. 
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E gure 4-1: Gayscal e Inages 
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Egure 4-3: Aeraged Squared Gael ent of Smotlied Inages 


Egure 4-4 show the riches fromtlx inage of figure 4-3. Ifsteresis thresbolclng 
leased on the magnitude of tlx areraged sejuared graded lias been used to suppress 
veak edges. Such hysteresis thresbolclng is used wth tlx (hnnyedge operator. Mte 
that tlis edge inage is relati vely inure to specular highlights, in comparison to the 
i lxl vi dial edge i nages of E gure 4- 4. 


4.5 A±omtic 3D Object Mdel A^qiisition 

This section outlines anathodfor automatic 3Dobject mdel accjuisitionthat com 
lires interpolation of viev® aixlManRlge Images. Tlx mthod imoires automati¬ 
cal 1 y accjui ri ng (my) pictures of tlx object under various codanations of pxse aixl 
i 11 ui iin ati on. Apielinhnary inplenantationof tlx mthodms usedto accjuire obj ect 
mdel features for tlx 3Drecogiitionexperimnt clscussedin Section 10.4. 
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iigrire 4- 5: Aftntakis Ebdecahedron 


Ihe obj ect, a ]i asti c cai’ racfel, ms Daunted on the tool flarge of a H XNAiO 
robot.. A video carara connected to a Sun Mcrosystena MC video clgitizer ms 
taunted near the robot. 

I dr the purpose of Interpolation of Vevs object nadel construction, the view 
sphere around the obj ect ms tessel atecl into 32 A ewpoi nts, the verti ces of a pent aid s 
dodecahedron (one i s i 11 ust rated i n E gure, 4- 5). A each vi ewpoi nt a “caned cal pose” 
for the object ms constructed that oriented the viewpoint towards the carara, while, 
keeping the center of the object in a tied posit ion 

Nne different configurations of lighting vere arrarged for the purpose of con¬ 
structing Man Rigs Inages. The lighting confgurations ware mi ly rnvirg a 
spotlight to nine different position that i 11 ri mn ated the object. The lanp positions 
ro ngh l y covered the viewbenhspfaere centered on the carara. 

The object ms raved to the caned cal poses corresponding to the 21 vertices in 
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the upper part (roughly 2/3) of the object’s viewsphere. A each of these poses, 
pictures vere taken wth each of the nine lamp positions. 

Man Kjge Inages at various scales of smothing vere constructed for each of 
the canonical poses. Object rndel features for rec pgn i ti on experi r en ts described in 
Chapter 8 vere derived fromthese Man Kjge Inages. Twenty of the irages from 
one such set of Man Kjge Inages are di spi ayed i n K gures 4- 6 and 4- 7. 

Tw> of these Man Kjge Inages vere used in an experimnt in 3Drecognition 
using a tvo-viewlinear Gain nation of Vew mthod This mthod requires corre¬ 
spondences a mng features at di fieri rg view. These correspondences vereestaliished 
by hand, usi ng a muse. 

It is liMy that such feature correspondence could be derived fromthe results 
of a nation program Shashua’s nation program[65], which coniines geonatry 
and optical fbwf was tested on irages fromthe experimntal setup and was alie 
to estaliish good correspondences at the pixel level, for view separated ly 4.75 
degrees. This range could be increased by a sequential bootstrapping process. If 
correspondences can be autonatically deternhned, then the entire process of building 
vi ewbased nadels for 3Dobjects can be mde fully autonatic. 

Ater porfornhng the experimnts reportedinChapter 10, it becana apparent that 
the view vere separated by too large of an angle (about 38 degrees) for establishing 
agoodamunt of feature correspondence betveensom view. This proliemnaybe 
rel i eved by usi ng rare vi ew. lii ng rare vi ew al so rates autonati c deternhnati on 
of correspondences easier. If the process of mdel construction is fully autonatic, 
havi ng a rel ati vel y 1 arge number of vi ew i s potenti al 1 y worteli e. 

The work of fRylor and Reves [69] provides som evidence for the feasibility of 
nnl ti pi e- vi ew based recogni ti on They descri be a cl assi fbcati on- based vi si on system 
that uses a library of view froma 252 vertex icosahedron-based tesselation of the 
viewsphere. Their view vere separated by 6.0 to 8.7 degrees. They report gpod 
classification of aircraft silhouettes using this approach 
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Egure4-7: Man Rlge Inages at Gnoiical Rses 
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Chapter 5 


M)del i ng Proj ection 


This chapter is concerned vith the representations of inage and object features, and 
wth the projection of object features into the inage, given the pose of the object. 
Bur different formulations are described, three of which are used in experiments 
reported in other chapters. 

The first three mdels described in this chapter are essentially 2E) the trans¬ 
formations comprise translation, rotation, and scaling in the plane. Such methods 
maybe used for single view of 3Dobjects via the veak perspective approximation, 
as described in [70]. In this scheme, perspective projection is approximated by or¬ 
thographic projection wth scaling. Whin this approximation, these methods can 
handle four of the six parameters of rigid body nation- everything but out of plane 
rotations. 

The method described in Section 5.5, is based on Iinear GJnbination of Vew, 
a vi ewbased 3Dmethod that was devel oped lay III man and Rsri [ 71]. 


5.1 Linear H*oj ecti on Mdels 

Bse deteriimationis often a component of medel - based obj ect recognition system, 
including the system descri bedintlis thesis. R»e deteriimationis frequentlyfrarad 
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as ail optinhzation problem The pose deternhnation proliemnay be significantly 
si npli fed if the feature proj ecti onmdel i s 1 i near i n the pose vector. The system de- 
scribedinthis thesis use proj ecti on mdel s having this property, this enables solving 
the erhedded optinhzation proliemusing least squares. least squares is advanta¬ 
geous because unique solutions ray be obtained easily in closed form This is a 
significant advantage, since the enheddedoptinhzationproliends sol vednany tines 
during the course of a search for an object in a scene. 

A1 of the forndations of proj ecti on descri bed beloware linear in the paramters 
of the transformtion ftcause of this they my be written in the folloving form 

ru=T{M „/$=M t f3. 

The pose of the object is represented ly @ a col urn vector, the object mdel 
feature by M a natrix. 77 the projection of the mdel feature into the inage by 
pose @ is a col urn vector. 

A tho ugh thi s particular formnay seemodd, it a mtural one if the focus is on 
solving for the pose and the object mdel features are constants. 


5.2 2DR>i it Eat ire Mdel 


The first, and si npl est, mthod to be descri bed ras used by Bugeras and i^acbe i n 
their vision systemffiFERf 1]. It is defined as follow: t] i =M where 


p'ix 

M t = 

Pi x 

~P i y 

1 

0 

1 - 

1 _ 


1 

Pi x 

0 

1 


and /3- 


P 

v 


( 5 . 1 ) 


The coordinates of object mdel point i are p 


i X 3DCl P i y 


The coordinates of the 
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mdel point z, projected into the inage bypose $ arep ' ix sndp This transformation 

is equivalent to rotation by 6, scaling tys, and translation by T, where 


T 



6 =arctan — 


This representation has an nn-sy rrot ri cal my of representing the tw> classes 
of features, which seem odd due to their essential equimlence, however the trick 
facilitates the linear fornalation of project ion given in Equation 5.1. 

In thi s mdel, rotati on and seal e are effected by anal ogy to the nd ti pi i cati on of 
conjiexnurhers, whichinduces transfornations of rotati on and scale inthe conplex 
plane. This analogy ray be mde conplete by noting that the algetra of convex 
nurhers a-\-ib is isomrpfaic wththat of mtrices of the form 


a b 
-b a 


5.3 2DR>i rt-Ricfi lb Eat ire Mdel 

This section describes an extension of the previous feature mdel that incorporates 
i nf ornati on about the nornal and curvature at a poi nt on a curve (i n add ti on to the 
coordnate information). 

There are ackantages i n usi ng ri cher features i n recogni ti on - they provi de rare 
constraints, and can lead to space and tim effiiencies. These potential advantages 
nast be weighed against the practicality of detectingthe richer features. K>r exanple, 
there i s i neenti ve to construct features i neorporati ng hi gher deri vati ve i nformti on at 
a point on a curve; however, masuri ng hi gher derivatives of curves derivedfromddeo 
i mgery i s probali y i qracti cal, because each deri vati ve nagni hes the noi se present 
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Egure5-1: Rlge Girve, Qculating Grcle, and Rdus \fctor 


in tlx data. 

The featuix descri Ixd lxre i s a conpronhse between ri driess aixl detectali 1 i ty It 
is defined as follow r/ ; =M where 



Pi x 

“~P iy 

l 

0 

Pi y 

Pi x 

0 

1 

*"/’ X 

i y 

0 

0 

c i y 

Ci x 

0 

0 



Tlx point coord nates and flare as above. c ia . aixlc ; y represent tlx rad ns vector 

of tlx curve’s osculating circle tint touches tlx point on tlx curve, as illustrated 
in Egure .5-1. His vector is nornal to tlx curve. Its length is tlx inverse of tlx 
err wit ire at tlx poi nt. The counterparts i n the i nage aie gi ven ly c ' x aixl c ' y 1th 

tlis mdel, tlx rad us vector c rotates aixl scales as cb tlx coord nates p but it does 
lxt translate. Thus, the aggregate feature translates, rotates aixl scales correctly. 

This feat ire mdel is used in tlx experi nsnts described in Sections 6.2, 7.4, aixl 
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10.1 Mfen the underlying curvature goes to zero, the length of the radius vector 
diverges, and the direction becomes unstable. This has been accommodated in the 
experi ran ts by truncating c A tho ugh thi s violates the “transform correctly” crite¬ 
rion, the mdel still works well. 

5.4 2DQ‘i erted-Ruige Eat ire Mdel 

This feature proj ection mdel is verysinhlar to the one described previously. It was 
desi gned for use i n range i nagery i nstead of vi deo i nagery. Ii ke the previ ous feature, 
it is fitted to fragmnts of inage edge curves. In this case, the edges label discon¬ 
tinuities in range. It is defined just as above in Section 5.3, but the interpretation 
of cis different. The point coordinates and f3 are as above. A above, c i x and c 

are a vector whose direction is perpendicular to the (range discontinuity) curve frag- 
mnt. The difference is that rather than encoding the inverse of the curvature, the 
1 ength of the vector encodes i nstead the i nverse of the range at the di sconti mi ty The 
counterparts in the irage are gi ven by c ' ix and c ' ^ The aggregate feature translates, 
rotates and scales correctly when used wthi raging mdel s where the obj ect features 
scale according to the inverse of the distance to the object. This holds under per- 
specti ve proj ecti on vith attached range 1 abel s when the obj ect i s sral 1 corpared to 
the distance to the object. 

Thi s mdel was used i n the experi ran ts descri bed i n Secti on 7.3. 

5.5 linear (Mnnation of Vew 

The technique used in the above natbods for synthesizing rotation and scale amunts 
to raking linear corMnations of the object mdel vith a copy of it that has been 
rotated 90 degrees i n the pi ane. 

In their paper, “Kcognit ion by linear (bntfl nation of Mdels” [71], III ran and 
Rsri describe a schema for synthesizing view under 3Dorthqgrapiiy vith rotation 
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and scale that has a linear paranaterization. They showthat the space of irages of 
an object is a sufcepace of a linear space that is spanned by the corponents of a few 
inages of an object. They discuss variants of their forndationthat are based on tw> 
view, and on three and rare view. Rcowring conventional pose paranaters from 
the 1 i near cord nati on coeffii ents i s descri bed i n [ 60]. 

The f ol 1 owng is a fcri ef expl anati on of the t w> vi ewmthod The reader i s referred 
to [71] for a fuller description. Ibint proj ectionfrom3Dto 2Dunder orthography, ro¬ 
tation, andscale is alinear transformation. If tv© (21) view are availalie, alongwth 
the transformtions that produced them(as in stereo vision), then there is enough 
data to invert the transformtions and solve for the 3Dcoordinates (three equations 
are needed, four are amiable). The resulting expression for the 3Dcoordinates vill 
be a linear equation in the corponents of the tw> view. .\Kv2l hi ew my then 
be syntbesizedfromtbe 3Dcoordimtes fyyet another linear transformtion. Gkn 
pounding these linear operations yields an expression for new2Dview that is linear 
in the corponents of the original tv© view. There is a quadratic constraint on the 
3Dto 2Dtransformtions, due to the constraints on rot at i on mt rices. Theusual lin¬ 
ear (bntflnation of Vew approach rates use of the above linearity property while 
synthesizing newview vith general linear transformtions (vithout the constraints). 
This practice leads to tw> extra paranaters that control stretching transformtions 
i n the synthesi zed i mge. It al so reduces the need to deal vith camra cal i Irati ons - 
the pixel aspect ratio my be accomodated in the stretching transformtions. 

Thefollovirgprojectionmdel uses atw> viewvariant of the linear GteM nation 
of Vew mthod to synthesize view vith 1 i noted 3Drot at i on and scale. AHtionally, 
transl ati on has been added i n a strai ghtf orvard my t] i =M where 

Pi x 0 Vi x 0 P i y ^ 10 

0 P i y 0 Vi y 0 P i x 0 1 




Vi x 

Mi = 

1- 

i_ 



T = 
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and 


A 


X 


A) ft 1A (3 3 /?4 A /?6 A 7 


The coordinates of the i’th point in one view are p ^ and p in the other view 

they are <7 , ^ and 5 , v 

Mabnthis projection model is used, /3cbes not in general describe rigid transfor¬ 
mation, but it is nevertheless cal led the pose vector for rotational consistency. 

Thi s mtbod i s used i n the experi mnt descri bed i n Secti on 10.4. 
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Chapter 6 


MVP M) de 1 Matching 


MPMdel Mtching 1 (MSjlis the first of tvo statistical forrdations of object 

recognition to be discussed in this thesis. It builds on the rodels of features and 
correspondences, objects, and projection that are described in the previous chapters. 
MMrvaluates joint hypotheses of natch and pose in term of their posterior prob¬ 
ability, given an irage. MMis the starting point for the second forimlation of 
object recognition, Rrsterior Mrginal Bse f&ti rati on (IMlij. vhichis described 
in Chapter 7. 

The MMobjective function is amnalie to search in correspondence space, 
the space of all possible assignmnts frominage features to rodel and background 
features. This style of search has been used in nany recognition system, and it is 
used here in a recognition experimnt involving lowresolution edge features. 

It is shovnthat under certain conditions, searching in pose space for naxinaof 
the MMobjective function is equivalent to robust mtbods of charier catching 
[ 47 ], 


1 Early versions of this work appeared in [74] and [75]. 
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6.1 Objective Rmctionfor Rise and Correspon¬ 
dences 


In this section an objective function for evaluating joint hypotheses of natch and 
pose using the MPcriterion wl 1 be derived 

Sie%, probability densities of inage features, conditioned on the paramters of 
natch and pose (“the paranaters”), are coiiined wth prior probabilities on the 
paramters using Ryes’ rule. The result is a posterior probability density on the pa¬ 
ramters, gi ven an observed i mge. Aiestinateof the paramters is tbenfornnlated 
by choosing tbemso as to mxinhze their a-posteriori probability (Hnee the term 
NAP. See ftck and And d’s textbook [4] f or a di scussi on of MPesti nati on ) MP 
estimators are especially practical vhen used wth norml probability densities. 

This research focuses on feature based recognition The probabilistic nodels of 
i mge features descri bed i n Chapter 3 are used Ini ti al 1 y, i mge features are assured 
to be mtually independent (this is relaxed in Section 6.1.1). AHtionally, mtcbed 
imgefeatures are assured to be normlly distributed about their predicted positions 
in the imge, and unmtcbed (background) features are assured to be umfornhy 
distributed in the imge. These densities are coiiined wth a prior mdel of the 
paramters. Madi a linear proj ect ion mdel is used, a si nple obj ective function for 
mtch and pose results. 

A described in Chapter 2, the imge that is to be analyzed is represented by a 
set of ndimnsional col urn vectors. 


Y={Y 1 ,Y 2 ,...,Y ll j , 1C- eR v 


The obj ect mdel i s denoted by 


M={M . \l ) 
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Mahnlinear proj ect ion mdel s are used, as discussedin Chapter 5, the object features 
wll be represented ly real natrices: M j £R vXz (z is defined bel owj. 

The paramters to be estirated in natcbing are the correspondences between 
inage and object features, and the pose of the object in the inage. A discussed in 
Section 2.1, the state of natch, or correspondences, is described by the wialie b 

r={r l 5 r 2 ,, r. 

Tire T i =M j mans that inage feature i corresponds to object rndel feature j, and 
I\- =Lmans that imge feature i is due to the background. 

The pose of the object is areal vector: fJ^R z . Apr oj ect ion function, 7T(), 
obj ect mdel features i nto the v- di mnsi onal i rage coordi nate space accordi ng to the 
pose, 

T\Mi,freR v . 

The probali 1 i sti c mdel s of i rage features descri bed i n Chapter 3 ray be wi tten 
as follow: 

iY t I r,/$ = 

where 

G^ 3 (x) =(2ii) ~%\. 

Tire i jis the cowiance mtrixassociatedwth inage feature i and obj ect mdel 
feature j. Thus i nage features ari si ng fromthe background are uni fornhy di stri buted 
o\er the imge feature coordi nate space (the extent of the imge feature coordi nate 
space along di mnsi on i is gjmnby W ;), and natchedimge features are normlly 
di stri buted about thei r predi cted 1 ocati ons i n the i nage. In sona appl i cati ons r/voii d 
be independent if i and j - an assumption that the feature statistics are stationary 
in the imge, or pAay depend oily on z, the imge feature index The latter is the 
case when the oriented stationary statistics mdel is used (see Section 3.3). 


WiW 2 - ■ w 1 

G^(Y t -T{M „/$) if r t - =M j 
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Asunhng independent features, the joint probability density on inage feature 
coordinates nay be witten as follow 


m r,$ 


n,*Y I TJ 


n 


i 

U ,U 2 - • • II 


n j) 


This assunption often holds 'when sensor noise cbiimates in feature fluctuations. 

The next step in the derivation is the construction of a joint prior on correspon¬ 
dences and pose. In Chapter 2, probabilistic mdels of feature correspondences vere 
discussed. The independent correspondence mdel is used here for sinplicity lie of 
the Mrkov correspondence mdel is discussed in the fol loving section. The proba- 
li lity that inage feature i belongs to the background is B y -while the remi ring prob¬ 

ability is unifornhy distributed for correspondences to the m object mdel features. 

In som situations, B i my be a constant, independent of i. Real ling Equations 2.1 
and 2.6, 

tfl) = n^) 3111 i) = 

i 

Hi or i nf ormti on on the pose i s assured to be suppl i ed as a nornal densi ty, 


B t if b; =L 
1 ~ B ' otherwse . 


M = G ^(/3-/3 o) 


-where 

=(2ti) -^1 tfp\~^exp(- . 

Hre ip p is the covariance mtrixof the pose prior and zis the dimnsionality of 
the pose vector, f3 WTithe conta nation of norml pose priors and linear projection 
mdels the systemis closed in the sense that the resulting pose estinate vill also 
be norml. This is convenient for coarse-fine, as discussed in Section 6.4. If little is 
knovn about the pose a-priori, the prior my be mde quite broad. This is expected 
to be often the case. If nothing is knovn about the pose beforehand, the pose prior 
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my be left out. In that case the resulting criterion for evaluating hypotheses wll be 
based on Mxi mmli kel i hood f or pose, and on MP f or correspondences. 

Asunhng independence of the correspondences and the pose (before the inage is 
conpared to the obj ect mdel), a lirxed j oi nt probali 1 i ty f uneti on nay be wi tten as 
follow, 

/»*• * (; ya(/3-/3 o) n H n 1 • 

*' : F=-L * : m 

This a good assurption when view based approaches to object mdeling are used 
(these are di scussed i n Oapter 4 and used i n the experi rou ts descri bed i n Chapter 
10). (Wh general 3Drotationit is inaccurate, as the visibility of features depends 
on the ori entati on of the obj ect.) Hi s probali 1 i ty f uneti on on natch and pose i s now 
used wth Ryes’ rule as a prior for obtaining the posterior probability of V and /3 

vhere ]JY) = )T) r I d/3]jY\ I) /3)p(r, ft is a nor ml izat ion factor that is formlly 
the probability of tbeimge. It is a constant wth respect toTand/^ the paramters 
being estimted 

The MP strategy i s used to obtain estimtes of the correspondences and pose 
by mxi nhzi ng tbei r posteri or probali 1 i ty wth respect to T and @ as fol 1 ow 

r, /3=argmx f3\ Y) . 

1 ,p 

E>r convenience, an objective function, L } is introduced that is a scaled logari thm 
of p(T, f3\ Y). The sam estimtes wll result if the mxinhzation is instead carried 
out over L 

r, I arttia\ | III. i 

where 


( 6 . 4 ) 


=in 


(6.5) 
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The denonhnator in Equation 6.5 is a constant that has been chosen to cancel con¬ 
stants fromthe numrator. Its Wue, which is independent of T and (3is 


C 1 


B\B 2 - ■ ■ B 
(W 1 W 2 • • • W) n 


m 2 i m 2 


i 


i 

m 


After sona nan i pd ati on the obj ecti've functi on nay be expressed as 


LV- * 


, where 


W .ffs'm »)+ e i\Aw-h" T tTHY,-nM * ah 

z ij:f: yw-j z 

( 6 . 6 ) 


A; j =1 n 


1 (1 -B i)W x W 2 




2 m 


Bi 


ii 


(6.7) 


Afen a linear projection rodel is used, T{M =M j/3 (linear projection 

mdels were discussed in Chapter 5.) In this case, the obj ecti ve function tales the 
followng simple form 


<ea=- \m .Yti'w •)+ E . (6.8) 

z ij-.i -ywj z 

Afen the background probability is constant, and when the feature cowianee 
mtrix deternhnant is constant (as when oriented stationary statistics are used), the 
formulas sinplify further - 


A=ln f \ a-/ 

\(2tt) 2 m B |^j> 2 J 

and 


(6.9) 


K&=- «)+ E • ( 610 ) 

z ij = r=JWj z 


Hre, -0is the stationary feature cowianee mtrix, and tfj , is the specialized 

feature cowi anee natri x. These were di scussed i n Secti on 3.3. 
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The first termof the obj ecti ve function of Equation 6.8 expresses the influence of 
the prior on the pose. A d scussed above, when a useful pose prior isn’t available, 
thi s termnay be dropped 

The second termhas a sirple interpretation It is a sumtakenover those inage 
features that are natched to object nodel features. The A 8 J are fixed rewards for 

naking correspondences, while the quadratic form are penalties for deviations of ob¬ 
served i nage features fromthei r expected posi ti ons i n the i rage. Thus the obj ecti ve 
function evaluates the amunt of the inage explained in term of the object, wth 
penalties for nhsnatch. This objective function is particularly sinpie in term of fi 
AfeiTis constant, /Tandits (posterior) covariance are estinated by weighted least 
squares. Afen using an algorithmbased on search in correspondence space, the es¬ 
timate of f3 can be cheaply updated by using the techniques of sequential paramter 
estination (See Elck and And d [4].) The A i 3 describe the relative value of a natch 

conponent or extension in a way that allow drect corpari son to the entailed nhs¬ 
natch penalty The values of these trade- off paramter (s) are suppl i ed by the theory 
(in Equation 6. 7) and are given in term of masuralie cbnain statistics. 

The formof the obj ecti ve function suggests an optinhzati on strategy: rate cor¬ 
respondences to object features in order to accumulate correspondence rewards while 
avoidng penalties for nhsnatch It is important that the A tJ be positive, otbervise a 

wnni ng strategy i s be to rake no matches to the obj ect at al 1. Thi s cond ti on defines 
a critical level of i rage clutter, beyand which the MP criteria assigns the feature to 
the background A j describes the dependence of the value of mtches on the amunt 
of background clutter. If background features are scarce, then correspondences to 
object features become more important. 

This objective function provides a sinpie and uniformway to evaluate mtch 
and pose hypotheses. It captures important aspects of recognition: the amunt of 
image explained in term of the object, as veil as the metrical consistency of the 
hypothesis; and it trades themoffin a rational way based on domain statistics. IVfet 
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previous approaches have rot mde use of both criteria sindtaneously in evaluating 
hypotheses, thereby losing som robustness. 


6.11 Usiig tta Mkw (xlHKpiriii^ Mil 

Mfentbe Mr hov correspondence rodel of Section 2.3 is used instead of the indepen¬ 
dent correspondence rndel, the functional formof the obj ective function of Equation 
6.6 remains essentially unchanged, aside fromgaining a newtermthat captures the 
influence of the interaction of neighboring features. The nanas of som of the con¬ 
stants changes, reflecting the difference betveen Equations 2.2 and 2.4. Nting that 
p(T, f3 | Y) is linear in p(I), it can be seen that the newtermin the logarithiiic 
obj ecti ve functi on wl 1 be: 

n — 1 

^lnr 8 (r„r 8+ i) . 

i =1 

A before, when an algorithrrbased on search in correspondence space is used, the 
esti rate of /3can still be cheap! y updated Acbange i n an el emnt of correspondence, 
som T i, wll nowadditionally entail the update of tw>of the term in the expression 
above. 


6.2 Ekperirartal Iirpierartation 

In tli s secti on an experi r an t, demnstrati ng the use of the MJtfcbj ecti ve f uncti on 
is described The intent is to demnstrate the utility of the objective function in a 
cbnain of features that have significant fluctuations. The features are derivedfrom 
real inages. The cbnainis natcbingamngfeatures froniowresolutionec^e inages. 

The poi nt- radi us feature rodel di scussed i n Secti on 5.3 i s used Gi ented stati onary 
statistics, as described in Section 3.3, are used to mdel the feature fluctuations, so 
t bat A / j =A /. 
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621 SBErdiinQuespciikBe Space 

God solutions of the objective function of Equation 6.8 are sought by a search in 
correspondence space. Search oh the whol e exponenti al space i s awi dedby heuri sti c 
pruning. 

Aiobj ecti’ve function that emirates a confgurationof correspondences, or natch 
(described by I), ray be obtained as follow: 

£(I) mx III ft . 

This optinhzationis quadratic in /3and is carried out by least squares. Sequential 
techniques are used so that the cost of extending a partial natch by one correspon¬ 
dence is 0(1) . 

The space of correspondences ray be organi zed as a di rected- acycl i c- graph (LA} 
by the followng parent-chi Id relation on natches. Apoint in correspondence space, 
or watch is a chi Id of another natch if there is som i such that V , =d_intbe parent, 

and T i =M for som j, in the child, and they are otherwse the sam. Thus, the 
chi 1 dhas one rare assi gnmnt to the rodel thanthe parent does. Thi s DQ s rooted 
in the natch where all assignmnts are to the background Al possible natches are 
reachable fromthe root. Afragmnt of an example LA7of this kind is illustrated 
inEgure 6-1. Gkponents of mtches that are not explicit in the figure are assigned 
to the background 

Efuristic beairsearch, as describedin [64], is usedto searcho\er mtches for good 
solutions of £. Success depends on the heuristic that there aren’t nany inpostors in 
the imge. Aiinpostor is a set of inage features that scores -veil but isn’t a subset 
of the opti mmnatch i npl i ed by the obj ecti ve f uncti on Aether way of stati ng the 
heuristic is that the best natch to n+1 object features is likely to contain the best 
natch to nobject features. 

The search mthod used i n the experi nants enpl oys a bootstrapp ng naebani sm 
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E gure 6-1: Ragrent of Gh’respondence Space BG 

leased on clstinguished features, Object features f, 2 and 3 are special, and mst 
lx detected The schema could lx nade robust by considering rare initial tripies 
of object features. Aternati’vely, indexing rat bods cod d be used as aneffnent and 
robust mans to iiitiate tlx search. Indexing rat bods are described by Qemns and 
Jacobs [ 19], and i n Secti on 9.1. 

Tlx algor it hmt bat ms used is outlined below 

B E AM - S E ARC H (iff 1 j 

CURRENT <— {E exactly one inage feature is notched to each of M i M 2 and M 3 } 

* ■ tlx rest are assigned to tlx background 
Rune Qirrent accord lg to C beep .50 Ixst. 

Iterate to Expoint: 

Aid to Qirrent all children of rentiers of Qirrent 
Rune Qirrent according to C beep iVIxst. 

;; Ni s reduced from‘20 to 5 as the search proceeds. 



6.2. EXPEBIDENIAL INPLEMMffll OV 



Egure 6-2: Inages used for Mtcling 


Kturn( QjRRENT) 

Sonatinas an extension of a natch wll produce one that is already in Gjr- 
RENT, that ms reached in a different sequence of extensions. Mfentbis happens, 
the natches are coalesced This condition is effnently detected by testirg for near 
equality of the scores of the item in QjRRENT. Because the features are derivedfrom 
observations contaiiing sona ranebmnoise, it is very unlikely that tw> hypotheses 
having clffering natches wll achieve the sana score, siixe tlx score is partly based 
on sunnad squared errors. 

622 Efcajie SndiRsilts 

The search nathoddesa'i bed in the previous section ms usedto obtaingood matches 
in a domain of features that ham significant fluctuations. The features were derived 
fronxeal inages. Alinear projectionmdel ms used 

Inages used for mtcling are shorn i n E gure 6-2. The object mdel ms derived 
froimset of 16 images, of whi ch the i nage on the 1 eft is an example. In this set, oiiy 
tlx light source position varied The image features used in tlx search were derived 
fromtlx image on tlx right. 

The features used for mtcling were derived fromtlx edge naps shorn i n E gure 
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Egure 6-3: Rlge Mpe used for Mtclimg 


6-3. The inage on the left shows the object mclel edges and the image on the right 
shovs the image edges. These edges ai’e fromthe Gmy edge detector [13]. Th e 
smoothing standard deviation is eight pixels - these ai’e lowresolution ec^;e mp. 

The object model edges vere derived froima set of 16 edge mpe, correspond rg to the 
16 images described above. The object model edges ai’e essentially the man edges 
wth respect to fbctuati ons i nduced ly vai’i ati ons i n 1 i ghti rg. (Iowresoluti on edges 
ai’e sensitive to lighting.) They ai’e sinhlar to the Man Rlge Images desa'ihed in 
Secti on 4.4. 

The features used in mat chi rg ai’e shovn in Egure 6-4. These' ai’e point-rad us 
features, as described in Section 5.3. The point coord nates of the features are indi¬ 
cated by dots, while the normal vector and curvature ai’e illustrated ly arc fragmnts. 
Rich feature represents - K) edge pixels. The 40 object features appear in the upper 
picture, the 125 image features lover picture. The distinguished features used in the 
bootstrap of the search ai’e indicated wth circles. The object features have been 
transformed to a newpose to insure generality 

The parameters that appear in the objective function ai’e: R the background 
probability and 'ijj the stationary feature covariance. These vere derived fronma 
natch cbm ly hand i n tlx exampl e cbm n The ori ented stati onary stati sti cs model 
of Section 3.3 ms used here. (Anornal model of feature fluctuations is implicit in 




6.2. EXPEBIDENIAL INPLEMMffll OV 


77 




E gure 6- 4: R)i it - Rtcl us Eat ures used f or Mt cli ng 
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E gure 6- 5: R)se Hi or used i n Search 

the objective function of Equation 6.8. This ms found to be a good nodel in tlis 
cbm n ) 

A1 oose pose prior ms used Ttis pose prior is illustratedinEgure 6-5. The prior 
pi aces the object in the upper left corner of the inage. The one standard deviation 
internals of position and angle are illustrated The one standard deviation variation of 
seale i s 30 percent. The actual pose of the obj ect i s vitli nthe i lxl catedone standard 
deviation bounds. This prior ms chosen to demonstrate that tlx nathod voids veil 
despite a loose pose prior. 

The Ixst results of tlx beamsearch appear in Egure 6-6. In tlx upper inage, 
tlx object features are delineated vith heavy lines. They are located according to 
tlx pose associated vith tlx Ixst natch Infix lover inage, tlx object features and 
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inage features are illustrated, -while the 18 correspondences associ ated wth the best 
natch appear as heavy lines and dots. 

The object features located according to the poses associated wth the five best 
Hatches are seeninEgure 6-7. The results are diffiult to distinguish because the 
poses are very si ihl ar. 


6.3 Search i n Rise Space 

Tbi s secti on wl 1 expl ore searcli ng the MJtfcbj ecti ve functi on i n pose space. Gkt- 
nections to robust charier matching wll be described. 

Apose esti rate i s sought fy orderi ng the searchfor naxi na of the MM)bj ecti ve 
function as follow, 

/3=argmx mx ^r, f3j . 

Subeti tuti ng the obj ecti ve f uncti on f romEjuati on 6.6 yi el ds 

1 arm ax iffl ^ [\ 3 - 1 -1\M T ip~-{Y l P \l J5 /$)] 

P ij-.PMj Z 

The pose prior termhas been dropped in the interest of clarity It would be easily 
retained as an additional quadratic term 

Tbi s equati on my be si nplibed wth the fol 1 owng debni ti on, 

D tJ (^= ^x T tp-x. 

Di /aj my be thought of as a generalized squared distance betwen observed and 
predicted features. It has been called the squared Mhalonolis distance [22], 

The pose esti mtor my nowbe wi tten as 

/3=argmx mx J2 [^j~ D -'KM J5 /$)] , 
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Egure 6-6: ftst Mtchlfesults: Rse and (SiTespoixleixes 
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or ecjri valently, as a nhrinhzation rather that mxinhzation, 


/3 =arg rim rim ^ [ D t j( Y t -Vy M ,4) —A J 

1 ij:F=Mj 


The sumis taken o\er those inage features that are assigned to model features 
(not the background) i n the natch. It my be re- wi tten i n the fol 1 oving way 


/i=argnhn > nm < 


/3 4^ ' r,: 


o if r,- =± 

n A 7U/., Hi A ill- .1/; 


or as 


/ #=argnhn ^ rim(0, nhn I) \ ) 71 \l . i ! A y) 


If the correspondence reward is independent of the model feature (tlis holds when 
oriented stationary statistics are used), A y =A In tlis case, A ; my Ire added to 
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each termin the sumvitbout affecting the nhninhzing pose, yielding the followng 
formfor the pose estinator, 

8=. arg nhn ^ nhn ( A ,, rim Di j(Yi—T{M j,fy)) ■ 

This objective function is easily interpreted - it is the sung taken over inage 
features of a saturated penalty The penalty (before saturation) is the smallest gen¬ 
eralized squared distance fromthe observedirage feature to sona projected mdel 
feature, life penalty rim jDi^x—T^M j,/3)) has the formof a \5ronoi surface, r 
described by fittedocher et. al. [42]. They describe a naasure of similarity on 
irage patterns, the Hruscbrff distance, that is the upper envelope (maximum^ of 
\5ronoi surf aces. The naasure used here di ffers i n bei ng saturated, and by usi ng the 
sumof \5ronoi surfaces, rather than the upper envelope. In their work, the upper 
envelope offers sona reduction in the complexity of the naasure, and facilitates the 
use of nathods of computational geonatryfor explicitly computing the naasure in 2 
and 3 dinansional spaces. 

Gmputational geonatry nathods right be useful for computing the objective 
function of Rjuation6.il. In higher dimensional pose spaces (4 or 6, for example) 
HTtree nathods my be the only such techniques currently available. Beuel has 
used HTtree search algorithm infeature matching. 

Mxl a, connect i on wl 1 be shown bet ween MM ear chin pose space and a ret hod 
of robust charier matching. Erst, thecbnainof MMs simplified in the followng 
way Ell stationarity of feature fluctuations is assured (as covered in Section 3.3). 
Erther, the feature covariance is assured to be isotropic. Mth these assumptions 
we have r/> 8 J =a 2 /, and D i 3 a] 2 . Alditionally, assuring constant background 

probability, we have A tJ =A The pose estinator of Equation 6.11 my now be 
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witten in the folloving sirplibedforng 

/3=argiini ^ rim(^ non . (-^ | X-T(M •, /$| 2 )) . 

/3 j zcr z 

% 

^n the projection function is linear, invertilie, and distance preserving, (2D 
and3Drigidtransforrations satisfy these properties), the estinator ray be expressed 
as follow, 


/3=argiiui^ ^ nhn(i\ lim 


1 

2^ 


V x ^-M ,| 2 )) 


This ray be further simplified to 


/3=argiiui ^ iin() d 2 {V 1 (Y t ,/3j)) , (6.12) 

^ i 

iy using the fol loving definition of a mnimnxli stance function. 


c(a) 


— non 

\72tr j 


x— M j 


(6.13) 


(banfieri ng natbods ray be usedto tabd ate approxi rati ons of d 2 (a) i naiii rage- 

1 ike array that is indexed by pixel coordinates. (bander-based approaches to irage 
registration proliena use the array to facilitate fast evaluation of pose objective 
functions, firrowet al. [3] describe an early rat hod where the obj ective function 
is the sumover mdel features of the distance fromtbe projected rodel feature to 
the nearest irage feature. Bbrgefors [8] reconnsnds the use of H\fJ distance rather 
than sunned di stance i n the obj ecti ve functi on. 

ftcently, Jiang et al. [47] described a rat hod of robust charier Hatching. In 
order to rale the mtbodless susceptible to disturbance by outliers and occlusions, 
they added saturation to the H\S objective function of Bffgefors. Their objective 
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function has the folloving form 


3 

where d j is the squared distance fromthe f th projected rndel point to the near¬ 
est inage point. Aide fromthe constants and square root, which cbn’t affect the 
nhninhzing pose, this objective function is equivalent to Equation 6.12 if the role of 
inage and mdel features is reversed, and the sense of the projection function is in¬ 
verted Ji ang et al. showi rpessi ve resul ts usi ng robust charier ratchi rg to regi ster 
ndti-mdal 3Dmdical inagery. 

6.4 DateiBioiB 

MP Mdel Mtcling perform well on low resolution inagery in which feature 
uncertainty is significant. It could be used to bootstrap a coarse-fine approach to 
mdel mtcling, yielding good results wth reasonable running tinas. (harse-fine 
approaches have proven successful in stereo mtcling applications. (See Gimon 
[33] and Brnard [2].) A coarse-fine strategy is straightformrd in the framvwrk 
described here. In a hierarchy the pose estimte fromsolving the objective function 
at one seal e i s used as a pri or for the esti mti on at the next. Bvi ng a good pri or on 
the pose wll greatly reduce the arount of searcling required at high resolution 
Ending a tractalie mdel that incorporates pose dependent visibility conditions 
w>ul d be useful for appl yi ng MM n non vi ewbased recogii ti on 

6.5 M ated Work 

The HHRvision systemof i^ache and Bugeras [1] uses sequential linear-least- 
squares pose esti mti on as well as the linear 2Dpoint feature and proj ect ion mdel 
descri bed i n Secti on 5.2. EffiERi s descri bed as a search al gori thm Efferent cri teri a 
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are used to evaluate candidate catches and to evaluate competing ‘Stole” hypothe¬ 
ses. Ai ad hoc threshold is used for testing a continuous naasure of the mtrical 
consistency of candidate natch extensions. Stole natch hypotheses are evaluated 
according to the amunt of inage feature accounted for - although not according to 
overall mtrical consistency. HfERvorks veil on real inages of industrial parts. 

Gad outlined a f&yesian strategy of natch evaluation based on feature and 
background statistics in his paper onautonatic programing for rodel - based vi si on 
[29]. In his systeng search vas controlled by thresholds on probabilistic masures of 
the reliability and plausibility of retches. 

low descri bes i n general term the appl i cati on of IJjesi an techni ques i n hi s book 
on Vsual Kcognition [51]. H treats the liniiization of expected rumirg tim of 
recognition. Inadditionhe discusses selectionamng numrous objects. 

Obj ect recogni ti on rat chi ng system often use a strategy that can be sunmri zed 
as a search for the mximl retching that is consistent. G5nsistency is frequently 
defined to man that t he rat chi ngimge feature is w thin finite bounds of its expected 
position (bounded error rodel s). GW system[14] i s one exarpl e. Such an approach 
my be cast in the framework defined here by assuring urnfimprobability density 
functions for the feature deviations. Rrse solution wth this approach is likely to be 
rare complicated than the sequential linear-least-squares mttod that can be used 
vten feature deviations have norml mdels. GW approach effectively finds the 
global optimmof its obj ective function. It perform veil on occluded or fragmnted 
real images. 

Btveri cjge, W ss and Rsenan [ 6] use an obj ecti ve f uncti on f or 1 i ne segmnt based 
recognition that is sinhlar to the one described here. In their vork, the penalty for 
deviations is quadratic, vbile the reward for correspondence is non-linear (exponen¬ 
tial) inthe amunt of nhssing segmnt length (^contrast, the reward descri bed in 
this paper is, for stationary mdels, linear inthe length of aggregate features.) The 
trade- off parameters i n thei r obj ecti ve f uncti on vere deternhmed emp ri cal 1 y. Thei r 



CHAPTER6. MPMUELMTCEim 


systemgives good perf or ranee inaebnainof real i rages. 

Ehrns and Rsenan [ 12] andRrns [11] describe a classification basedrecognition 
system They focus on the use of description net wads for effiiently searching amng 
nd ti pi e obj ects wth a recursi ve i ndexi ng sebem. 

Hnson and Bra [ 27] [ 26] descri be a general obj ective f uneti on approach to i rage 
understanding. They use a nhn innmdescription length (ML) criterion that is 
desi gned to w>rk wth generic object mdels. The approach presented here is tailored 
for specific object nodels. 

6.6 Sunnary 

AMPmdel Hatching technique for visual object rec pgn i ti on has been described 
The resulting objective function has a simple formvhen nornal feature deviation 
mdels and linear projection mdels are used Eqerimntal results were shovm 
indicating that MPMdel Mtcling works veil in lowresolution matching, where 
feature devi ati ons are si gni hcant. R1 ated work was di scussed 
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Posterior Mirgi nal Pose 
Es t i nation 


In the previous chapiter onMPMdel Mtchingthe object recogniti on proli einvas 
posed as an optinhzation proliemresulting froma statistical theory. In that fornn- 
1 at ion, complete hypotheses consist of a description of the correspondences betveen 
inage and object features, as veil as the pose of the object. The nathodvas shovn 
to provi de effecti ve eval uati ons of mtch and pose. 

Thefornnlationof recogniti on that is describedinthis chapter, Bsterior Mrginal 
Bse Estimation 1 (0®), builds onMPMdel Mtcbing. It provides a smoth 
objective function for evaluating the pose of the object - vitbout cormtmnt to a 
particular mtch The pose is the cost important aspect of the proli erg in the sense 
that knoving the pose enables grasping or other interactionviththe object. 

In thi s chapter, the obj ecti ve functi on i s expl ored by proli ng i n sel ected parts of 
pose space. The domain of these experimnts is features derived fromsynthetic laser 
radar range imagery, and grayscale video imagery. Alinoted pose space search is 
performd i n the vi deo experi ri an t,. 

In Chapter 8 the Expectation - Mxinhzation (EA)1 algorithms discussed as a 
1 An early version of this vork appeared in [76] 
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mans of searchi ng for 1 ocal mxi m of the obj ecti ve functi on i n pose space. 

AH ti onal experi mnts i n obj ect recogni ti on usi ng the FMEobj ecti ve f uncti on 
are described in Chapter fO. There, the EMalgorithmis used in conj unction wth 
an indexing mthod that generates initial hypotheses. 


7.1 Obj ecti ve Kmcti on for Rise 

Thefollowngmthodwas mti vatedbythe observationthat in heuristic searches over 
correspondences wth the obj ecti ve function of MP Mdel Mtcbing, hypotheses 
havi ng i npl ausi li e mtches scored poorl y i n the obj ecti ve f uncti on The i npl i cati on 
was that sunnnng posterior probability over all the mtches (at a sped he pose) light 
provi de a good pose eval uator. Thi s has proven to be the case. Athough i ntii ti vel y, 
this light seerdike an odd my to evaluate a pose, it is at least demcratic in that 
all poses are evaluated in the sam my The resulting pose estinator is snooth, 
and is amnalie to local search in pose space. It is not tied to specific mtches - 
it is perhaps in leeping wth Mrr’s reco nmn dati on that conputational theories of 
vi si on shod d try to sati sf y a pri nci pi e of 1 east connttmnt [ 52]. 

AH ti onal mti vati on ms provi ded by the w>rk by Adi lie, Gfi ger and B1 tboff 
on stereo [ 78]. They di scussed conputi ng di spari ti es i n a stati sti cal theory of stereo 
where a mrginal is conputedover mtches. 

In MP Mdel Mtcling, joint hypotheses of natch and pose were evaluated by 
their posterior probability, given an imge - p(T, f3 \ Y). V and f3 stand for cor¬ 
respondences and pose, respectively and Y for the imge features. The posterior 
probability was built fromspecihc nodels of features and correspondences, objects, 
and proj ecti on that were described in the previous chapters. The present fornnla- 
tion wll first be described using the independent correspondence mdel. lie of the 
Mrkov correspondence mdel wll be described in the folloving section 

lire we use the sam strategy for evaluating object poses: they are evaluated 
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by their posterior probability, given ail inage: p(/3 | Y). The posterior probability 
density of the pose my be computed front he j oint posterior probability on pose and 
natch, by formally taking the marginal over possible Hatches: 


m n • 

r 

bn Section 6.1, Equation 6.4, p(T,/31 Y) vas obtained via Qjes’ rule froimprob- 
alilistic models of inage features, correspondences, and the pose. Substituting for 
iT } /3\ b), the posterior marginal nay be written as 


m n = e 

r 


im 


( 7 , 1 ) 


lii ng equati ons 2.1 (the i ndependent feature mdel) and 6.2, we ray express the 
posterior marginal of /Tinternaof the component densities: 


M U = TprEE" n : II •> I Df) 

IV1 ri r 2 r„ i i 


or 


M i) = 


M 


EE" Elllfd'. i .)] • 


ri r 2 r„ i 

Beaking one factor out of the product gives 


AP\ Y) = 


AP \ ' \ ' _ _ \ ' 

ih V) 

Ti r 2 r n 


'n — 1 


nt^- 1 V,PAt .-)] 


i =1 


iY n I r 


or 


API ?) = 


AP \ ~~ \ ~~ _ _ \ ^ 

ih yi 

ri r 2 rvi 


'n — 1 


IlNu i h,mt .)] 


.i =1 


J2AY n I T n ,M r 
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Gkiti ni ng i n si nhl ar fashi on yi el ds 


m P = y^-II EiU', I c./«r 


TM s nay be wi tten as 


m y= ^lUY t \ & , 


AYi I ft = '£AY i I Ti,ftiT i) 


fitting the T i surainloits cases gives, 


AYi I ft =iY 1 n =J, M? i =-0 + T,AYi | c =M „ ftiv i =M 3 ) 


Substituting the densities assunadintbe mdel of Section 6.1 in Equations 6.1 and 


2.2 then yields 


*Yi\*= Wi \. |/ -+E g*,<« -m ,■>/» 


Installing this into Equation 7.2 leads to 


j a\ Vi— B1B2 B lift tt i . y ^1 • • • 1 Bj . n \ i > 

^ Y) ~ {W 1 W 2 - ■ ■ W) n iY) n m 5, ^ 8 ^ 


A i n Secti on 6.1 tie obj ecti \e functi on for Rsteri or Mrgi nal Rse fiti rati on i s 
defined as the scaled levari thmof the posterior mrgi nal probali 1 i ty of the pose, 


=ln 


AfA Y) 
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where, as before, 


C 1 


b 1 b 2 ■ ■ ■ B 

(w 1 w 2 • • • W) n 



i 


This leads to the following expression for the obj ective function (use of a norml pose 
prior is assured) 


Tj -1 


KjfSj —— - (fS-fl o) i’ 


fj {fBfS o) 


-Ei 


n 


i+E 


Ikj 


Wl-B 


M, 


rri 


II 


( 7 , 5 )' 


This obj ective function for evaluating piose hypotheses is a smooth function of the 
pose. Mthods of continuous optimization ray be used to search for local maxima, 
although starting values are an issue. 

The first termin the IMIl .olijective function (Ejuation 7.5) is due to the piose 
prior. It is a quadratic penalty for deviations fromthe norimal piose. The second 
termessenti ally measures the degree of alignment of the object model viththe image. 

It is a sumtakenover inage features of a s moth nonlinear functi on that peals up 
posi ti vel y when the pose bri ngs obj ect features i nto al i gnrent wth the i nage feature 
in question The logarithriic termwll be near zero if there are no model features 
close to the inage feature in question 

In a straigbtformrd irplemntation of the obj ecti ve function, the cost of evalu¬ 
ating a piose i s 0)rn) , si nee i t i s essenti al 1 y a non 1 i near cbubl e sumover i nage and 
mdel features. 


7.2 L£i rg the Mrkov Correspondence Mdel 

Mabn the Mrkov Ghrespondenee nadel of Section 2.3 is used instead of the in 
dependent correspondence nadel, the sunning techniques of the previous section 
no longer apply Because of this, a computationally attractive closed formforrda 
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for the posterior probability no longer obtains. Nevertheless, it wll be showi that 
the posterior probability at a pose can still be effiiently evaluated using dynamo 


prqgrannmng. 


Rf erri ng to Ejuati on 7.1, and usi ng the i ndependenee of natch and pose i n the 
prior (discussed in Section 6.1), the posterior marginal probability of a pose ray be 
written as follow, 


AP I 


r AY) 


lii ng Ejuati ons 2.3 and 6.1, 


M I U 


M 


E rfi'i 

r 


TuftAY 2 1 r 2 ,/$- • • AY n | .nr,)- • • 

r i(Ti, r 2 )r 2 (r 2 , r 3 ) • • •y , _i(r n _ 1 ,r n ) 


This nay be re-written as follow, 


^IU= E 




n — 1 


n^r.on^r 


i +y 


rir 2 ... £ u =1 i =i 


(7.6) 


where 


c t =Ay i | Ti,p4r i ) • 


Rre, the dependence of con /3has been suppressed for rotational brevity 

.Yxl it wll be shown that p(/31 Y) nay be witten using a recurrence relation 


MY) = H-E A .-i(iVMr„) , 


(7.7) 


where 


6i(tj) = E c i( i ) r i(v«) 


(7.8) 


and 


/>„+! (ti) = E M^n+l (6)r„ +1 (V 6) 


( 7 . 9 ) 
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Expand ng Rjuati on 7. 7 i n term of the recurrence rel ati on, 

MV= 44-E E c n (r n ) , 

K y ] r„ L r ^i 
or 

iA y)= E h n -2{T n -i) f[ CiiTi) r^iT^Tn) . 

T ) r„^i r„ i =«-i 

^ai n usi ng the recurrence rel ati on, 

ifs\y}= E E ^-3(r„_ 2 )c„-2(r„_ 2 )r„_ 2 (r„_ 2 , r„_o 

K y j rvi r„ L r ^2 

n 

■ n c 8 (r 8 ) r n _i(r n _i,r n ) , 

i =n — 1 

or 

Bl n n ~ 1 

m y>= fS- £ n c .(r.) n '•.(r„r. + i) 

71 ' r^2 T^i T„ i=n- 2 i =n-2 

Gkitinuing insinhlar fashion leads to 

rtf'll 44- £ Ai(r 2 )f[c.(r,)nr,(r„r,_ B ) , 

71 * * r 2 r 3 .. .rT i =2 i =2 

and nowusing the base expression for h i(- ), 

m «- 44- £ £ Cl (r,)i-i(Tj,r 2 ) ii-=.( r .)n'-i(r,,r.-n) 

T 1 ) r 2 r 3 ...jL r i J i =2 «'=2 

or finally, 

jWIW- 44- £ [n«(r i )ffr i (r i ,r iH ,) • 

T 1 ) rir 2 ...jU=i *=i 

whi ch i s the sana as Rjuati on 7.6. Thi s conpl etes the wri hcati on of Rjuati on 7. 7. 

hfext, a dynamo prc^rannmng algorithmwll be described that effiiently eralu- 
ates an obj ectiw function that is proportional to the posterior marginal probability 
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of a pose. The objective function is ^j-^/3 \ Y) . The algor it hmis a direct inple- 
mntati on of the recurrence defined i n Rjuati ons 7. 7, 7.8 and 7.9, that bui 1 ds a tali e 
of mites of h ,•(• ) fromthe bottomup. N>te that h 4 -(&) onlyhas tw>mlues, depending 
on whether 6=Lor not. In the foilowng description, the syrhol T is used to stand 
for ananonyrous mdel feature. H . .denotes array locations that store mlues of h 
and if• , • , • ) is an access function, defined belo\y that accesses the stored mlues. 

; ; ; lie Eynarh c Pr ogr aimi ng to eva 1 uate PAPEwi th M rkov Correspondence M>del . 
Evaluate- Pose(/T) 

Hi±^Z b qi,kftr r (M 

Hu E b (I 1 , k fyr i (h, 1) 

K>r i <—2 T> iV— 1 

H 8 l^E 6 ^-1 Mhkfrr n+1 (M 
H t T^Z b Hi-l,b)a,i,k®r n+ 1 (hi) 

Return (E& HA 1. 1*0 n h /$) 

; ; ; define the a uxi 1 i ary f uncti on 0. 

Returnee i | l$q(b)) 


; ; ; Access val ues of Hs t or ed i n a t abl e. 
lfa,b) 

If b =L Return (H aJ _) 

Else Return (H aT ) 

The loop in EVALUATE- Ft)SE executes (fn) tinas, and each tina throtgh the 
loop does (^rig emlrations of the sunnands, so the conplexity is (tyrri). This 
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has the sam complexity as a straightforwdinplemntationof the IMFT.ol ;j ecti ve 
function when the Mrhov mdel i s not used (Ejuation 7.5). 

The sunning technique used here was described by (beesenan [17] in a paper 
about using naxi moment ropy mt hods inexpert system. 


7.3 Rmge I rage Ekperi rail 

Aiexperinant investigating the utility of Rsterior Mrginal Rse Etination is de¬ 
scribed in this section AHtional experimnts are described in Chapter 10. 

The objective function of Equation 7.5was sanpledina cbnainof syntheti c range 
imgery. The feasibility of coarse-fine search mthods was investigated fy sampling 
smothed variants of the objective function 

7.31 EVjmtiairf’ Edms 

The preparation of the features used in the experimnt is sunnarized in Egure 7-1. 
The features were oriented-range features, as described in Section 5.4. Tro sets of 
features were prepared, the “rodel features”, and the “i rage features”. 

The object mdel features were derivedfroma synthetic range irage of an M5 
truck that was created usi ng the ray traci ng prqgramassoci ated vith the HL 0® 
Rchage [ 23]. The ray tracer was mdi hed to produce range i rages i nstead of shaded 
inages. The synthetic range irage appears inthe upper left of Egure 7-2. 

In order to si nd ate a 1 aser radar, the syntheti c range i rage descri bed above was 
corrupted wth sinnlated laser radar sensor noise, using a sensor noise rodel that 
is described fy Shapiro, Ifeinbold, and Rrk [62], In this noise mdel, masured 
ranges are either validor anoralous. 5didmasuremnts are norrally distributed, 
and anonal ous masuremnts are uni fornhy di stri buted The corrupted rarge i rage 
appears i n E gure 7- 2 on the ri ght. T> si ml ate post sensor processi ng, the corrupted 
inage ras “restored’ via a statistical restoration mthod of Mu on and WTs [56]. 
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E gure 7-1: Keparati on of Katures 










Egure 7-2: Synthetic Ringe Inage, N>isy Rnge Inage, aixl Rstored Ringe Inage 
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The restored range inage appears in the lowr position of Egure 7-2. 

Qientedrange features, as described in Section 5.4, were extractedfrointhe syn- 
thetic range inage, for use as rndel features - andfromthe restored range inage, 
these are calledthe noi syfeatures. The features were extractedfromthe range i nages 
in the followng manner. Kinge discontinuities were located ty thresholdng neigh- 
bori ng pi xel s, yi el d ng range d sconti nui ty curves. These curves were then segnanted 
into approximately 20-pi xel -1 ong segments via a process of line segnant approxi ra¬ 
ti on. The 1 i ne segnants (each representi ng a fragrant of a range d sconti nui ty curve) 
were then concerted i nto ori ented rarge features i n the fol 1 owng manner. The X and 
Y coord nates of the feature -vere obtained fromt he naan of the pixel coord nates. 

The normal vector to the pixels was gotten via least-squares line fitting. The range 
to the feature was esti rated ty taki ng the naan of the pi xel ranges on the near si de 
of the d sconti nui ty This information was paclaged into an ori ented rarge feature, 
as described in Section 5.4. The model features are showiinthe first image of Eg¬ 
ure 7-3. Each line segnant represents one ori ented rarge feature, the ticks on the 
segnants i nd cate the near si de of the rarge d sconti nui ty There are 113 such obj ect 
features. 

The noisy features, derived fromt he restored rarge inage, appear in the second 
inage of Egure 7-3. There are 62 noisy features. Som features have beenlost due 
to the corruption and restoration of the rarge inage. The set of image features ras 
prepared front he noi syfeatures by rancbnhy deleting half of thefeatures, transform 
ing the survivors accord ng to a test pose, and addng suffiient rancbnhy generated 
features so that | of the features are due to the obj ect. The 248 i nage features appear 
i n the tli rd i nage of E gure 7- 3. 

7.32 Sajiirg The (fijecthe Entiai 

The obj ective function of Equation 7.5 was sampled along four straight lines passing 
through the (knowi) location in pose space of the test pose. Qi ented stationary 
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statistics we used, as described in Section 3.3. The stationary feature comriance 
was esti rated froma band natch done wth a muse bet wen the mdel features and 
the noisy features. The background rate paramter Bm s set to |. 

Sanples taken along a line through the location of the true pose in pose space, 
paral 1 el to the Aaxi s are sbowii n E gure 7-4. Hi s corresponds to mvi ng the obj ect 
along the Aaxis. The first graph show sanples taken along a 100 pixel length (the 
inage is 256 pixels square). The second graph of Egure 7-4 show sanples taken 
along a 10 pixel length, and the third graph show sanples taken along a 1 pixel 
length The ^coordinate of the test pose is 55.5, the third graph show the peak of 
the obj ecti ve functi on to be i n error fy about one t wnti eth pi xel. 

Sanples takenalong a 1 i ne paral 1 el to the yaxi s of pose space are sbowii n E gure 
7-5. This corresponds to a simultaneous change in scale and angular orientation of 
the obj ect. 

Each of the abote graphs represents 50 equally spaced sanples. The sanples are 
j oined wth straight line segments for clarity Sanpl i ng was al so done paral 1 el to the 
yandz/axes wth similar results. 

The sanpl ing described in this section show that in the experimntal domain the 
obj ecti te f uncti on has a pro min ent, sharp peak near the correct 1 ocati on Som 1 ocal 
raxim are also apparent. The observed peak ray not be the dominant peak - no 
global search ng was performed. 

Cbarse-Ene Sampling 

A3di ti onal sanpl i ng of the obj ecti ve of Rjuati on 7.5 ras perf ornad to i nwsti gate the 
feasibility of coarse-fine search techniques. Acoarse-fine search natbod for finding 
raxim of the pose-space objective function wiuld proceed as follow. Rais are 
ihtially located at a coarse scale. A each stage, the peakfromtbe previous scale is 
used as the starting mite for a search at the next (less srooth) scale. 

The obj ectiw function was smoothed fy replacing the stationary feature comri- 
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ance natrix xb i m the fol 1 owng nan ner: 


V’+V’ s • 

The effect of the smothing natrix xfj s is to increase the spatial scale of the co- 

wiance natrices that appear in the objective function. 

Robes along the Aaxis thro ugh the known 1 ocati on of the test pose, wthwious 
anaunts of snaothi ng are showi i n R gure 7- 6. The snaothi ng natri ces used i n the 
proling were as follow, in the sam order as the fgures. 

DA|(.l) 2 , (-1) 2 , (10.0) 2 , (10.0) 2 ) , 

DA|(.025) 2 , (.CG5) 2 , (2.5) 2 , (2.5) 2 ) , 

and 

DAJr(.Ol) 2 , (.01) 2 ,1.0,1.0) . 

where DA|- ) constructs diagonal natrices fromits argunants. These snaothi ng 
natrices wre deteriimed erprically. (N> smothing was performd in the fourth 
fgure.) 

These snaothed sanpling experimnts indicate that coarse-fine search ray be 
feasible in this cbnain. In Egure 7-6 it is apparent that the peak at one scale ray 
be used as a starting mire for local search in the next scale. This indicates that a 
final 1 i ne search al ong the XsA s coul d use the coarse fine strategy. It i s not suffii ent 
evidence that such a strategy wll w>rk in general. A before, there is no guarantee 
that the locatedraximinis the global raximm 
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7.4 \f deo Imge Ekperi mrt 

In this section, another experimnt wth the FMEobj ecti ve function is described. 

The features are point-radius features derivedfromvideo inages. Alocal searchin 
pose space is carried out, and the objective function, and a snootbed wiant, are 
probed i n the vi ci ni ty of the peak 

7.4.1 EVjmtiairf’ Ertms 

The features used in this experimnt are the sam as those used in the MPMdel 
Mtchi ng correspondence search experi mnt reported i n Secti on 6.2. They are poi nt- 
radius features, as described in Section 5.3. The features appear inEgure 6-4. 

7.42 SBErdiinBee Spaoe 

Asearchras carried out in pose space froma starting mlue that was deter rimed by 
band. The search was inplemnted wth Rrwll ’ s mtbod [59] of ndtidimnsional 
non-1 i near opti rizati on Rwel 1 ’ s mtbod i s sinhlartotbe conj rgate- gradi ent mtbod, 
but derimti’ves are not used. The line nhnirizations wre carried out wth Bent’s 
mtbod [59], which uses successive parabolic approxinations. The pose resulting 
fromtbe search is illustrated in Egure 7-7. This result is close to the best result 
fromtbe MP Mdel Mtcling correspondence search experi mnt. That result is 
reproduced here inEgure 7-8. It is corfiorting that these tw> substantially different 
searchmtbods (coriinatorial versrn continuous) provide sirilar answers in, at least, 
one experi mnt. 

7.43 Sajiirg TEb (fijecthe Erctiai 

Sanjies were taken along four straight lines passing thro ugh the peak in the objec¬ 
tive function resulting fromtbe search in pose space reported abo^e. (In the range 
experimnt, sanpling was done thro ugh the knowitrue pose.) The results are ill us- 
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Smoothing vas performd at one scale. The smotbing matrix was 

DA$(.03) 2 , (.03) 2 , (3.0) 2 , (3.0) 2 ) . 

Holing, perfornad in the sana manner as in Egure 7-9 vas perfornad on the 
smoothed objective function. The results are shown in Egure 7-10. In comparison 
to the range i nage experi ran t,, 1 ocal maxi na are more of an i ssue here. Thi s ray be 
partly due to the background features here having more structure than the randonhy 
generated background features usedintbe range inage experiment, ftcause of this, 
anomalous pose estimates (where the pose correspond mg to the global maximnmnof 
the obj ecti ve functi on i s seri ousl y i n error) ray be nore 1 i kel y i n thi s cbrai n than 
in the range experiment. 


7.5 Illation to R>bu3t E£ti ration 

This section describes a relationship betveen I Mil, and robust estimation 1^ 
simplifying the domain a robust estimator of position is obtained A connection 
betveen the si npl i fled robust estimator and neural metvorks is dscussed 
(bnsider the folloving simplifications of the domain 

• drop the pose prior 

• the object has ore feature 

• the image is ore-drnensioral vithwdth W 

• the pose is a scalar 


• the projection function translates: TTy ,/$ =/3 
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Whthese simplifications, the observation mdel of Equation 6.1 becoms 


AY< | rj 


< 


x 

w 


GAY -/$ 


if Ti =T 
otbervise 


5 


where 

aw= TST"* (-£*)' 

In this si npl i bed cbnai n T ray be interpreted as a collection of variables that de¬ 
scribe the validity of their corresponding masurensnts in Y. Thus, T 4 - /=d_naybe 

interpreted as naaningtbat Y i is valid, andT =Las Y i beingimalid ]iY ;) is defined 
to be zero outside of the range [ ~y~i y] • 

The prior on correspondences of Equation 2.2 takes the folloving form 


A?i) 


I b if r ,■ =l 

) 1 —B otbervise 


Uingf&yes’ rule and the independence of V and /3al lov® the fol loving probability 

of a sanple audits validity, 


AYi, Ti | ®=AY i | AMT 


/ 

B_ 

< ^ 

(1-#G AY-A 


if 0 =T 

otbervise 


( 7 . 10 ) 


The probali 1 i ty of a sarpl e ray nowbe expressed by taki ng a rargi nal over the 
probability in Equation 7.10, as follow, 


AYi I A = J2AYM, | /$ = ^ -Ai-m AY -/$ 

r i 


IMni ng an obj ecti ve functi on as a 1 cag 1 i kel i hood of f3 
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1 eads to the anal og of the IMIl.ol ;j ecti ve functi on for thi s si rpl i hed cbmi n, 

M = 4W-f)] . (7,u) 

i 

Thi s my al so be wi tten 

m= (7i2) 

i 

where 

-^ =ln ° (i > ■ 

Thi s i s the Mxi mrnli M i hood obj ecti ve functi on for esti rati ng the man of a 
normal population of variance a 2 , that is contaiimated vith a uniformpopdationof 
wdth \y where the fraction of the rirxture due to the uniformpopdationis B 

The function is approximately quadratic when the residual is snail, and 
approaches a constant when the residual is large. \feii|pes to zero, beconas 
quadratic, and the estimtor beconas least squares, for the case of a pure normal 
population. Mfen is viewed as a penalty function, it is seen to provide a 
quadratic penalty for snail residuals, as least squares does, but the penalty saturates 
when residuals becom large. R)bust estimation is concerned wth estimators that 
are, like this one, less sensitive to outliers that least squares. A vith many robust 
estimators, the resulting optimization proliemis more diffiult than least squares, 
since the objective function is non-convex. This estimtor falls into the class of re¬ 
descending Mestimtors as di scussed by Hiber [41]. 

IMIl.i s somwbat different fromrobust esti mti on in that the saturating aspect 
of the objective function not only decreases the influence of “outliers” (fy analogy, 
the background features), it also reduces the influence of image features that don’t 
correspond to (are not close to) a given object feature. 
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7.5.1 (chreticnto Mid MwakSgridEntiai 

There is ail inportant connection betveen the estimtor of Equation 7.12 and the 
signrndfunction of neural networks, 

l-|exp(—a) 

The sigmid function is a srnoth variant of a lexical swtehing function that has 
beenusedfor mdeling neurons. It has been used extensively by the neural network 
co nrarm i ty in the construction of networks that classify and exhibit som form of 
learning behavior. The bEKalk neural network of Sejnowski and Rsenberg [61] is 
a wsl 1 knowexanpl e. 

It turns out that, under som conditions on the paramters, the sigmid function 
of x 2 is approximately equal tof^a), ignoring shifting and scaling. This near equality 
is illustrated in Egure 7-11. 

The two functions that are plotted in the fgure are 


/(a) =2.0 [c(x 2 ) —.5] and g(a) 


ln[.25+.75exp(—a; 2 )] 

ln[.25] 


The upper graphsbows entity plottedtogetber, while the low graphshow 
their difference. It can be seen that they agree to better than one percent. 

Because of this near equality, for a special case of the paramters, anetworktbat 
emirates the ML estimtor of Equation 7.12 for a contaiimated norml population 
wll have the fornillustrated in Egure 7-12. 

This network, wth its arrangemnt of sigmid and summits seem to fit the 
defini ti on of a neural network 

The robust estimtor of Equation7.12, audits neural network approximation, are 
(approximately) optiml for locating a Gussian cluster inuniformnoise. 

Asinhlar neural network realization of the FMEobj ecti ve function would like¬ 
wise be near optiml for locating an object against a uniformbackground 
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E gure 7-11: j{ aixl ,?), aixl j{ ,?) — ,?) 
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7.6 IME HRci eixy Bbund 


This section provides a lover bound onthe covariance mtrixof the FMEestinator. 
Btinators of vector paranaters (life pose) my be characterized ty the covariance 
mtrix of the estinates they produce. The Ganar-Ro bound provides a lover 
1 i riit for the covariance mtrixof unbiased esti mtors. FBiased estimtors that 
achieve this bound are call efficient estimtors. Bscussions of estimtor effiiency 
and Gansr- Rio bounds appear i n [ 63] and [ 72]. 

The Gamr-Ro bound on the covariance mtrix of estimtors of fl based on 
observations of Jfis given by the inverse of the Esher i nf or mt ion mtrix, 

ax h >i t'(A ■ 

Hre, CO(- ) denotes the covariance mtrix of the rancbmvector argunant. This 
mtrix inequality mans that GbX( —I is positive senh- definite. 

The Esher informtion mtrix is defined as follow, 

lc i I'. x([Vpln^ /$] [V pin/$] T ) 

where V pis the gradient wth respect to $ which yields a col urn-vector, and E 
stands for the expected value of the argunant wth respect to 

The covariance mtrix, and the Ganar-Ro bound, of the FMEestinator are 
diffiult to calculate. Instead, the Ganar-Ro bound and effiieney wl 1 be deter- 
nhned for estimtors that have access to both observed features Y p and the 

spondenees T p The Ganar-Ro bound for these “coqiete-data” estimtors wl 1 be 
found, andit wll be shown that there are noefEient conplete-data estimtors. Bb 
cause of this, the FMEestinator is subject to the sana bound as the corplete-data 
estimtors, and the FMEestinator cannot be effnent. This follow, because the 
FMEesti mtor can be consi dered to be techni cal 1 y a conplete- data esti nator that 
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ignores the correspondence data. 

Interna of the conpl ete-data estimator, the Esher information has the following 
forrp 

Hft = E y^tV^ln^rl ft] [V pln^lfr| ft] T ) . 


Asunhng independence of feature coordinates and of correspondences, the prob¬ 
ability of the conplete- data i s 


Ax?\ft= n,iv v * • 

i 

lii ng f&yes rul e and the i ndependenee of V and ft 

AYi, r,-| ®=AY , | v-,ftA r ,-) . 


ftf erring to Equations 6.1 and 6.3, and using constant baclground probability 
and linear projection, the corplete-data conponent probability ray be witten as 
follow, 

AXi, Vi | ft = 

Whi ng tomrds anexpressi onf or the E sher i nf ornati on, -ve di fferenti ate the conpl ete- 
data probability to obtain 


WiW 2 ... w 1 

.(Yi-Mtf if Vi =M j 


v ft V 


ylnll A Y iXi I ft = Y, 

i i 


VpAYi, r ,-1 ft 

AYiXi I ft 


MfenT i =4 V pftftTiXi I ft =0? otherwse, in the case V 4 - =M 


v^y,-,r,-| ft=v 


1 -B 


G „JY, -M,a, 


m 
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Dfferentiating the nornal density (a form!a for this appears in 8.3), gi’ves 

V/jtfy,-, r t - I $=(-) l - = ^-G , 

so that 

V <rlrf| *" r >= M > ■ 

Then the gradi ent of the corpl ete- data probali 1 i ty ray be expressed as 

VMvri a — y. Mf+T'm • 

i j : £=M J 

Mte that setting this expression to zero defines the Mxinnmlikelihood estinator 
for /Tin the conplete-data case, as follow: 


£ MJCY,= £ 

I j : J=M j i j : r=M j 


or 

/3= ( £ Mfc/Ar) £ MftTfYi ■ 

This estinator is linear inK The imerse has been assured to exist - it wll exist, 
providedcertainlinear independence conditions are rat, and enough correspondences 
to mdel features appear in the natch. This typically requires tw> to four correspon¬ 
dences in the applications described here. 

ftturning to the Esher infornation, w need to emirate the expectation 


r i 

~ 

£ mJCu, 

£ Hf+Tfri 

y L* 3 : j 

J j : F=M J 


5 


(7.15) 


If — E y, r 



118 


CHAPTER 7. PC8TEIECRMRKINAL PCSEESTlNATION 


where the if th residual has been written as follow, 


e tl =Y t —M j{3 


ft- nanhng i ndi ces and pd 1 i ng out the sum gi 'ves 


If =E y, r I 




'\i j ■ I=M j i’j’: r ; /=Mj/ 


ftferring to Equation 7.14, the expectation my be split and rowd as follow, 


* ' i y y 

\i j ■ r=M J i’j’\ r ; /=M 


M J j E ^|r(e; 


The inner expectation is o\er nutually independent (Russian rancbmvectors, and 
equals their cowianee mtrix when the indices agree, and is zero otherwse, so 


If =E r ( y y 


I y ij:lhMj i'j': T t i=M 


This expression sirplifbs to the followng: 


If =E r 1 y Mji’-’M, 

i j : J=M j 


The sumnay be re- wi tten i n the f ol 1 owng way by usi ng a del ta f uncti on conpari ng 
ft and M j , 


If - E r (^r.Mj) Mf ft j lift - ^ E r , (ftM,) Mj ft j lift . 

i j i j 

The expectation is just the probability that an imge feature is natched to som 
mdel feature. This is 1—ft so the Esher inf ornati on my be written in the followng 
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simple forng 


If — 


1 -B 


* 3 


m 


MjiP-lM, , 


or as, 


If =a-fy> 


m 


* 3 


This is an attracti ve result, andray be easily interpreted, in relation to the Esher 
information for the pose 'when the correspondences are fixed (a standard linear esti¬ 
mator). The Esher information in that case is jMj pi”/ Mj,it maybe interpreted 

as the sumover matches of the per-natch Esher information 

In light of this, the corplete-data Esher information is seen to be the arnrage 
of the per-match Esher information, multiplied by the expected number of features 
matched to the mdel, (1 — T}n 

Aieffiient unbiased estimator for the complete- data exists if and only if 


/3=/3+/ ,'iiV In/ihb /$ 


This requires that the right hand side be independent of $ since the estimator 
(Equation 7.15) is not a function of f3 Expanding the right hand side, 


/3+ 


(l-$n 

m J J 

1 3 


-l 


£ -MjH 


i j : J=M j 


This is not independent of f3 Qe my to see this is to note that the factor ml ti plying 
/Tin the second termis a function of T. Thus, no effiient estimator exists in the 
complete-data case, and consequently, no effiient estimator exists for IMIE 


7.7 Mated Wk 

Geen [31] and Geen and Shapiro [32] describe a theory of MxinamEkelihood 
laser radar range profiling. The research focuses on statistically optimal detectors 
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and recognizers. The single pixel statistics are described by a lirxture of uniform 
and normal components. Ringe profilirg is implemented using the EMalgorithm 
lider som circunatances, least squares provides an adequate starting value. A 
continuation-style variant is described, where a range accuracy parameter is varied 
betveen EMxxivergenees froma coarse value to its true value. Geen [31] computes 
Garner- Rio bounds for the conpl ete- data case of Mxi namnli kel i hood range profil e 
estimtor, and compares simulated and real-data performance to the linhts. 

Gss [16] [15] describes an approach to visual object recognition that searches 
in pose space for maximal alignments under the bounded-error model. The pose- 
space objective function used there is piecewse constant, and is thus not amenable 
to continuous search methods. The search i s based on georetri c formalation of the 
constraints onfeasilie transformations. 

There are som connections betveen I Mil, and standard methods of robust pose 
estimation, like those described by Hralick [38], and Kmar and Hinson [48]. R)th 
can provide robust estimates of the pose of an object, based on an observed image. 

The rai n di fference is that the standard methods require specification of the feature 
correspondences, vhi 1 e FMEcbes not - by consi deri ng al 1 possi li e correspondences. 
FMErequires a starting value for the pose (as cb standard methods of robust pose 
esti rati on that use non- convex obj ecti ve f uncti ons). 

A mentioned above, Tfille, Giger and Bltboff [78] discussed computing dis¬ 
parities in a statistical theory of stereo where a marginal is computed over matches. 
Tfille extends this technique [79] toother domains of vi si on and neural networks, 
among themwnner-take-all networks, stereo, long-range rotion, the traveling sales¬ 
man prolierp deformable template rat cling, learning, content addressable memo¬ 
ries, and models of brain development. In addition to computing marginals over dis¬ 
crete fields, the Gbbe probability distribution is used This facilitates continuation- 
style optinhzation methods by vari ati on of the temperature parameter. There are 
som si rhl ari ti es betveen tli s approach and usi ng coarse-fine vbth the IMIb.ol ij ec- 
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tive function. 

Kiel nail and Rggi o [ 24] descri be a natbod of 3Drecogni ti on that uses a trai ned 
Gneralized Kidial Rsis Rnsetion network. Their natbod requires correspondences 
to be known during training and recognition. Gfe sinhlarity between their scbem 
and IMIl.i s that both are essentially arrangemnts of s moth uni nodal functions. 

There is a sinhlarity between Ibsterior Mrginal Rse Btination and hbugh 
transform(Hl) mtbods. Riughly speaking, HTmthods evaluate paranaters by 
accumulating votes in a discrete paramter space, based on observed features. (See 
the survey paper by Illingvorth and Kttler [44] for a discussion of hb ugh mtbods.) 

In a recognition application, as described here, the HTnathod would evaluate a 
discrete pose by counting the number of feature pairings that are exactly consistent 
somwbere wthinthe cell of pose space. A stated, the HTnathodhas diffiulties 
wth noisy features. This is usually addressed by counting feature pairings that are 
exactly consistent somwbere nearby the cell in pose space. 

Tbe utility of the HTas a stand-alone natbod for rec qgn i ti on i n the presence of 
noise is a topic of sona controversy This is discussed by Gimon in [34], pp. 220. 
Rrbaps tbi sis due to an unsui talie noi se mdel i npli ci t in tbe hbugh Transform 

Rsterior Mrginal Rse Bti rati on evaluates a pose by accumulating tbe loga- 
rithmof posterior mrginal probability of tbe pose over image features. 

Tbe connecti on between HTmthods and paramter eval uati on vi a tbe 1 ogari thm 
of posteri or probali 1 i ty has been descri bed by Stephens [ 67]. Stephens proposes to cal 1 
tbe posterior probability of paranaters given image observations “Tbe ftobalilistic 
hbugh Transform!! H provided an example of estimating line parameters from 
image point features whose probability densities were described as having uniform 
and normal components. H also states that tbe natbod has been used to track 3D 
objects, referring to bis thesis [68] for definition of tbe natbod used 
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7.8 Sunnary 

Amthodof evaluating poses in object recognition, Rsterior Mrginal Rse Btina- 
ti on, has been descri bed The resul ti ng obj ecti ve functi on was seen to have a si npl e 
formwhen norml feature deviationmdels and linear proj ecti on mdels are used 
Iililtedexperirental results were showiindicating that inacbnainof synthetic 
range di sconti nui ty features, the obj ecti ve funeti on nay have a pro nhm emt sharp peak 
near the correct pose. Sona local naxina vere also apparent. Aether experimnt, 
i n whi ch the features were deri ved fromvi deo i rages, was descri bed (bnnecti ons to 
robust esti rati on and neural networks vere exanhned B)unds on the perf ornanee of 
si npl i bed I Miles 1 i nators were i ndi cated, and rel ati on to other work was d scussed 



Chapter 8 


Expect at i on — Mixi ni zat i on 
Al gor i t hm 


The Expectation- Mxinhzation (ETvI algorithmras introduced in its general form 
byhfcpster, RJi n and Iai rd i n 1978 [21]. It is often useful for computing estimates 
incbmins having tvo sanple spaces, where the events inone are unions over events 
in the other. This situation holds arong the sanple spaces of Bsterior Mrginal 
Rse Estimation(EN1E) andMPMdel Mtcling. In the original paper, the vide 
generality of the EMalgorithnis discussed, along vith several previous appearances 
in special cases, and convergence results are described 

In this chapter, a specific formof the EMalgorithmis described for use vith 
I MIC It is used for hypothesis refine mn t in the recognition experimnts that are 
described in Chapter 10. Issues of convergence and inplemntation are discussed 


8.1 Efefm t i on of EMI t erat i on 

In this section a variant of the EMalgorithmis presented for use vith Rsterior 
Mrginal Bse Btination, vfichvas descri bed in Chapter 7. The follow ngmdeling 
assumptions vere used N>rml rodels are used for matched inage features, while 
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uniforrrmdels areusedfor unratched(baclground) features. If a prior ontheposeis 
available, it is nornal. The i ndependent correspondence mdel is used AUitionally, 
a linear mdel is used for feature projection. 

In 0®} the pose of an object, ft is estirated by mximzing its posterior 
probability, gi'venanirage. 


/3=argrax ftft\ Y) . 

Anecess ary condition for the raxinmis that the gradient of the posterior prob¬ 
ability wth respect to the pose be zero, or equimlently, that the gradient of the 
logari thmof the posterior probability be zero: 


0 = V plnrtj] Y) 


In Secti on 7.1, Rjuati on 7.2 the fol 1 owngforml a was gi 'venfor the posteri or prob- 
alilityof the pose of an object, gi ven an i rage. This assures use of the independent 
correspondence mdel. 

m n= i d . 

Inposingthe condition of Equation 8.1 yields the folloving, 


0=V p 



ft + ^ln^y,- | ft 

i 


Q= Ypj ft y | ft 

Ah i AY, \ft 

h in Equation 7.3, we ray write the feature UK conditioned on pose in the 
foil owng way, 

Ay, I ft = I i ) , 

r, 


or, using the specific mdel s assured in Section 7.1, as reflected in Equation 7.4, and 
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usi ng a 1 i near proj ecti on rodel, 


AY< | ft, = 


B: 


W!W 2 - ■ ■ w m 




The zero gradient condition of Equation 8.2 ray nowbe expressed as follow, 


o=Mi+y— 


1 -Bi 


m 


E, VpG^lYi-Mjk 


3 ft 


w t w 2 




1th a nornal pose prior, 


Aft = G 4>p(ft~ft o) 


and V pAft =-Aft*l> p {ft~ft o) 


The gradient of the other nornal density is 


-it# -g ,-am 


(8,3) 


ftturning to the gradient condition, and using these expressions (negated), 


0=tf , yoM + E «*« -MikMWY* -M* 


WiW 2 




Enally, the zero gradient condition ray be expressed conpactly as follow, 


0=0 ^0-ft o ) + '£W ij Mji>r}(Y i -M0 , 


* 3 


wththe followng definition: 


(8.4) 


Wi ,• = 




l ' J B t 


1 -Bi W!W 2 


Y.-Mjft 


(8.5) 


Equation 8.4 has the appearance of being a linear equationfor the pose estinate 
that satisfies the zero gradient conditionfor bei ng a raxi nnm bhfortunately, it isn’t 


ft 
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a linear equation, because W i 3 (the “wights”) are not constants, they are functions 
of fi T) find solutions to Equation 8.4, the EMalgorithmiterates the foilowng tw 
steps: 

• Treating the wights, W ij as constants, sol w Equation 8.4 as a linear equation 
for a newpose estinate f3 This is referred to as the JVfetep. 

• liing the mst recent pose estinate fi, re-evaluate the wights, W 8 according 

to Equation 8.5. This is referred to as the Estep. 

The Mstep is so namd because, in the exposition of the algorithmin [21], it 
corresponds to a Mxinnmlikelihood estirate. A discussed there, the algorithm 
is also amnalie to use in MPfornolations (life this one). lire the Mstep corre¬ 
sponds to a MP esti rate of the pose, given that the current esti rate of the wights 
is correct. 

The Estep is so namd because calculating the W ij corresponds to taking the 

expectation of somrancbmvari aides, gi vent he inage data, and that the mst recent 
pose esti rate i s correct. These rancbmvari aides haw mite 1 if the i tlii nage feature 
corresponds to the f th object feature, and 0 otherwse. Thus, after the iteration 
conwrges, the wights provide conti nuous-wl ued esti rates of the correspondences, 
that vary betwen 0 and 1. 

It seem somvbat ironic that, having abandoned the correspondences as being 
part of the hypothesis in the fornnlation of IMIE a good esti rate of themhas 
re-appeared as a byproduct of anatbodfor search in pose space. This esti rate, the 
posterior expectation, is the nhninomvarianee estimator. 

Sing essentially a local nathod of non-linear optinhzation, the EMalgorithm 
needs good starting values in order to conwrge to the right local naxinnm It my 
be started on either step. If it is started on the Estep, an initial pose estinate is 
required Mabn started on the IVbtep, an initial set of wights is needed 

Ai initial set of wights can be obtained froma partial hypothesis of correspon 
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denees in a simple manner. The weights associated with each set of corresponding 
features in the hypothesis are set to 1, the rest to 0. Indexing mthods are one source 
of such hypotheses. In Chapter 10, Aigle Ihir Indexing is used to generate candidate 
hypotheses. In this scenario, i ndexi ng provi des initial alignnants, these are refined 
usi ng the EMd gori thng then they are \eri tied by exanhni ng the wal ue of the peak of 
the FMEobj ective function that the refiner nn t step found. 

8.2 Ghmergerre 

In the original reference [21], the ETvfelgorithmwas shown to have good convergence 
properties under fairlygeneral circunatances. It is showithat the li Mi hood sequence 
produced by the algorithmis nosnotonic, i.e., the algorithmnever reduces the walue 
of the objective function (or in this case, the posterior probability) fromone step to 
the next. W[77] claim that the convergence proof in the original EMreferenee is 
Saved, and provi des another proof, as well as a thorough discussion It is possible 
that it will wander along a ri cjge, or becom stuck in a saddle point. 

In the recognition experimnts reported in Chapter 10 the al gori thmtypi cal ly 
converges in 10 - 40 iterations. 

8.3 iEjderait at ion Issues 

Som thresholding mthods vere used speed up the computation of the Eand M 
steps. 

The veigbts W ^provide a masure of feature correspondence. A the algorithm 
operates, met of the weights have walues close to zero, since most pairs of inage and 
object feature don’t align well for a given pose. In the computation of the Mstep, 
mst termvereleft out of the sung based onathresholdfor W ; 7 Som representative 

weights froman experimnt are displa)edin TUie 10.1 in Chapter 10. 

In the Estep, mst of the verkis in evaluating the Qussianfunctions, vvhichhave 
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quadratic form in them K>r the reason stated above, met of these expressions have 
values very close to zero. The evaluation of these expressions was mde conditional 
on a threshold test applied to the residuals Y i —Mjf3 \fei the (x,y) part of 
residual exceeded a certain length, zero was substituted for the value of the (Russian 
expressi on Eli es i ndexed by i rage coordi nates right provi de another effecti ve my 
of i npl enanti ng the threshol di ng here. 

The value of the IMIl.ol;j ecti ve function is conputed as a byproduct of the E 
step for little additional cost. 

8.4 M at eel Wk 

The vork of Geen [ 31] and Geen and Sbapi ro [ 32] that i s di scussed i n Secti on 7. 7 
describes use of the EMalgorithirdn a theory of laser radar range profiling. 

Iipson[50] describes a non-statistical mtbodfor refining alignnants that iterates 
solving linear system. It ratches rodel features to the nearest irage feature under 
the current pose hypothesis, while the mthod described here entertains notches to 
all of the inage features, weighted by their probability Epson’s natbodwas shovn 
to be effective and robust i n an i npl err an t, at i on that refines alignnants under Drear 
(bntflnation of Vews. 
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Angl e Rai r Indexi ng 

9.1 Inscriptionof Mthod 

Agle Rir Indexing is a simple method that is designed to reduce the amount of 
search needed in finding matches for image features in2Drecognition. It uses features 
havi ng 1 ocati on and ori entati on. 

Annrariant property of feature pairs is used to index a tali e that is constructed 
ahead of time. The property used is the pair of angles betveenthe feature orientations 
and the line j oining the feature’s locations. These angles are 9 i and 9 2 in Egure 9-1. 
The pair of angles is cl earl y i nrari art under translation, rotation, and scaling in the 
plane. 

liing orientations as -veil as point locations provides rare constraint than point 
features. Because of this, indexing ray be performd on pairs of simple features, 
rather than groups of three or rare. 

The table is constructedfromtbe object features in a pre-processing step It is 
indexed ly the angle pair, and stores the pairs of object features that are consistent 
wth the value of the angles, vitlintbe resolution of the table. The algorithmfor 
constructing the table appears below 

Adistanee threshold is used to suppress entries for features that are very close. 
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Egure 9-1: Aigles for Indexing 


Such f eat ures pairs yi el cl si oppy i li ti al pose esti nates and are poor initial hypotheses 
for recognition. 

; ; ; G ven an array model-features and a t abl e si ze, n 
; ; ; hi 1 s i n I hr 2 i ndex arr ay ANGLE- PAIR- TABLE by si de- effect. 
Bui LD- .Angle- TVBLE(mdel-features, n, cl stance-threshold) 
m<—LENGTH) mdel - features) 

; First clear the table. 

E>r i f-Oh m 
E>r j <—0 T m 

.Angle- Pai r- TAble[ 1, j] 

; ; JXbwdl 1 in t he t. abl e ent ri es . 

E>r i f-Oh m 
E>r j <—0 T m 

If i H 

fl nuclei-featraesfi] 
f2 nuclei- feat nines [ j ] 
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If Dl STANCE(fl, f2) > dstance-threshold 

< qr • C\I <T| All I M)l < IS ill. f 2, n) 

ANGLE- PAI R- TABLEfq r] ANGLE- PAI R- TABLEfq r] U4 j > 


The followng function is used to calculate the table indces for a pair of features. 

Mte that the indexing wape around 'when the angles are increased by % This 
ras done because the features used in the recognition experimnts described in this 
research are often straight edge segnsnts, and their orientations are aiinguous by7r 

; ; ; Chi cul at e i ndi ces i nt o ANGLE- FAl R- TABLE f or a pal r of f eat ures . 
Calculate- I ndi ces (f 1, f2, n) 

n 

i <-( [ %\ md n) 
j HI %\ md n) 
returned j ^ 


The followng algorithms used at recognition-tim to generate a set of pairs of 
correspondences froninage features to object features that hate consistent mlues of 
the angle pair invariant. The indexing operation sates the expense of searchng for 
pairs of object mdel features that are consistent wth pairs of irage features. fElie 
entries fromadj acent cells are included amng the canddates to accomodate angle 
mites that are “on the edge” of a cell boundary. 

; ; ; M p over t he pal rs of f eat ures i n an i rmge and generat e 
;;; candi dat e pal r s of feature correspondences 
Generate- Cwdi DATES (inage- feat ures, n) 
cand dates 0 

me-LE NGTH (i rage- f eat ures) 
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Ear i 0 T> m 

Ear j +1 to m 

<qr ><—CALCULATE- I NDI CES (inage-features[i], inage-features[j], n) 
Ear fiq <— 1 to 1 

Ear Sr <— 1 to 1 

Ear <kl >£ .ANGLE- PAI R- TAble[ ((q+Sq) nadr), ((r-|-<5r) nodn)] 
candi dates <—candi dates U «i k ><j 1 » 

ftt urn( candi dat es) 


9.2 Sparsi ficat i on 

In tie recognition experimnts described below and in Section 10.1, an additional 
technique ras used to speed up recognition-tim processing, and reduce the size of 
the table. A the table was built, a substantial fraction of the entries were left out 
of the table. These entries were selected at random This strategy is based on the 
followng observation Ear the purpose of recognizing the object, it is only necessary 
for som feature pair fronthe obj ect to be bothintbe tali e and visible intbe inage. If 
a reasonable fraction of the obj ect is visible, a substantial nunher of feature pairs wll 
be available as potential partners in a candidate correspondence pair. It is unlikely 
that the corresponding pairs of object mdel features wll all haw been rancbnhy 
el i rim ated when the table was built, evenfor fairly large anaunts of sparsification. 


9.3 M at eel Wk 

Indexing based on inwiant properties of sets of inage features has been used by 
Iandan and Wfson, in their work on geomtric hashing [49], and by Qemns and 
Jacobs [19] [20], Jacobs [45], and Thompson and Mndy [70]. In those cases the 
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inwianee is wth respect to affie transformations that bare eight paranaters. In 
this w)rk the inwianee is wth respect to translation, rotation, and scale in 2E) 
where there are four paranaters. Th ompson and Mndy describe aninwiant called 
vertex pai r s . Th ese are based on a ngl es rel ati ng to pai rs of rerti ces of 3Dpol ybedra, 
and tbei r proj ecti ons i nto 2D Agl e IM r Indexi ng i s sonawhat si nhl ar, but i s si npl er 
- bei ng desi gned for 2Dfro m2D rec pgn i ti on 

Gemns and Jacobs [19] [20], and Jacobs [45] use grouping naebanism to select 
snail sets of inage features that are likely to belong to the sana object intbe scene. 
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Chapter 10 


Recogni t i on Exper i ment s 


This chapter describes several recognition experimilts that use Tbsterior Mrginal 
Bse Estimation wth the EMAgorithm The first is a complete 2D recognition 
systemthat uses Agle Rir Indexing as the first stage. In another experinant, the 
EMEobjective function is evaluated on nurarous randomaligmants. AHtion- 
ally, the effect of occlusions on I Mil. arc investigated Enally, refinenant of 3D 
alignnants is deranstrated 

In the folloving experi rants, inage edge curves vere arbitrarily subdivided into 
f ragnants for feature extracti on The recogni ti on experi ren ts based on these features 
show good performance, but the performance light be infroved if a rare stalie 
subdi vi si on techni que vere used 


10.1 2DRcognition Eqierimrts 

The experi rants described in this section use the EMalgorithmto carry out local 
searches in pose space of the EMffbobjecti ve function This is used for evaluating 
and refining alignnants that are generated by Agle Ihir Indexing. Acoarse - fine 
approach i s used i n refini ng the al i gnrants produced by Agl e Ri r Indexi ng. T> thi s 
end, tvo sets of features are used, coarse features and fine features. 
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Egure 10-1: Gayscale Inage 
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Egure 10-2: (Sarse Mclel and Inage Rallies 
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E gure 10- 3: E ne Mclel aixl Inage Katures 
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The vi deo i nage used for the recogni ti on experi ran t appears i n E gure 10-1. The 
mdel features vere derived fromMan Elge Inages, as described in Section 4.4. 

The standard deviation of the smoothing that was used in preparing the mdel and 
i nage edge nape was 3.97 for the coarse features, and 1.93 for the fine features. The 
edge curves were broken arbitrarily every 20 pixels for the coarse features, and every 
10 pixels for the fine features. Rint-radius features were fitted to the edge curve 
fragrants, as described in Section 5.3. The coarse mdel and irage features appear 
i n E gure 10- 2, the fine mdel and i nage features appear i n E gure 10- 3. There are 81 
coarse mdel features, 334 coarse inage features, 246 fine mdel features, and 1063 
fine inage features. 

The oriented stationary statistics mdel of feature fluctuations was used (this 
is described in Section 3.3). The pararaters (statistics) that appear in the I Mil, 
objective function, the background probability and the covariance matrix for the 
oriented stationary statistics, were deri ved fromnatches that were done by hand. 

These training matches were also used in the erprical study of the goodness of 
the normal mdel for feature fluctuations discussed in Section 3.2.1, and they are 
described there. 

10.11 GeHctirg Aigrals 

Initial alignrants were generated using Agle Rir Indexing (described in Chapter 9) 
on the coarse features. The angle pair table was constructed vith 80 by 80 cells, and 
sparsification was used - 5 percent of the entries vere randonhy kept. The distance 
threshold vas set at 50 pixels (the inage size is 640 by 480). The resulting table 
contained 234 entries. M4h these values, umfornhy generated rancbmangle pairs 
hare . 0365 probali 1 ity of “hitting” in the table. 

Mabnthe inage feature pairs were indexed into the table, 20574 candidate feature 
correspondence pairs were generated. This is considerably fever that the 732 ml lion 
possible pairs of correspondences in this situation Egure 10-4 illustrates three of 



140 


CHAPTER 10. RECXXM PI CN EXPERIAENIS 


the candidate alignnants by superinposing the object in the irages at the pose 
associated wththe initial alignnant inpliedby the pairs of feature correspondences. 
The i ndi cated scores are the negati ve of the FMEobj ecti ve functi on corputed wth 
the coarse features. 

1012 Saiig Lilxi’ Ai^mls 

The initial alignnants were evaluated in the followng my The indexing process 
produces hypotheses consisting of a pair of correspondences fromirage features to 
object features. These pairs of correspondences vere comer ted into an initial wight 
natrixfor the EMalgorithm The Mstep of the algorithmms run, producing a 
ro ugh alignnant pose. The pose ms then evaluated using the Estep of the EM 
algorithm^ which computes the value of the objective function as a side effect (in 
addition to a newestirate of the weiglls). Thus, running one cycle of the EM 
algorithm^ initialized by the pair of correspondences, generates a rough alignnant, 
and evaluates the ENEobj ecti ve function for that alignnant. 

1Q13 Bfiriig Lifee* Aigrats 

This section illustrates the nathod used to refine indexer alignnants. 

Egure 10-5 show a closer view of the best scoring initial alignnant fromAigle 
IMr Indexing. The initial alignnant ms refined by running the EMdgor it hnto con¬ 
vergence using the coarse features and statistics. The result of this coarse rehnenant 
i s di spl ayed i n E gure 10- 6. The coarse rehnenant ms refined further by runni ng the 
EMalgorithmto convergence wth the fine features and statistics. The result of this 
fine rehnenant i s showi i n E gure 10- 7, and over the vi deo i rage i n E gure 10- 8. 

Gound truth for the pose is available in this experinant, as the true pose is the 
null pose. The pose before rehnenant is 


[.99595, -0.0084747, -0.37902, 5.0048] 


X 
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Score: -84.9719 


Egure 10-5: ftst Aignmnt fromlixlexer, wth (Sarse Score 



Score: -91.2349 


Egure 10-6: (Sarse Mnemnt, wthGarse Score 
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figure 10-8: Ene ftfinenant wth Vdeo Inage 
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Egure 10-9: (brrespondences wthWgli larger than.5 


and after tlx refinensnt it is 

[ 1.00166,0.0051108, 0.68621, -4.7817] T . 

TTx eixocl ng of tlxse poses i s described in Section 5.3 (tlx null poseis [1,0, 0,0] 

TTx iiitial pos||is in error 1y about .01 in scale aixl 5 pixels in position The final 
pose errs by about .005 in scale and 1.8 pixels in position Thus scale accuracy is 
improved by a factor of about tw>, and position accuracy is irproved by factor of 
about three. Aiexperiment shoving rare dramatic, irprovemnt is describedbelovy 
i n Secti on 10.4.1. 

In tlxse experi rants, less that 15 iterations of tlx EMdgorithmvere needed for 
convergence. 


IQ 14 Brri BVM§ts 
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A discussed in Section 8.1, a nice aspect of using the EMalgorithmvith FMEis 
that estinates of feature correspondences are available in the 'weight natrix. Egure 
10-9 displays the correspondences that have wight greater than .5, for the final 
convergence shown in Egure 10-7. Kre, the inage andmdel features are displayed 
as thin curves, and the correspondences betwen themare shown as heavy lines 
j oining the features. N)te the strong si nhlarity between these correspondences, and 
those that the systemwas trained on, sbowiin Egure 3-2. 

Ulie 10.1 displays the values of som of the wights. The weights showhaw 
value greater than .01. There are 299 wights this large arong the 413,507 wights. 
The 39 wights sbowiare those belonging to 20 inage features. 


10.2 Bvel mt i rg RurfomA i gnnart s 

Ai experimnt was perfornad to test the utility of I Mil, i n evaluating rancbnhy 
generated alignmnts. Correspondences amng the coarse features described in Sec¬ 
tion 10.1 having assignmnts fromtw> imge features to tw> rndel features wre 
rancbiiiy generated, and evaluated as in Section 10.1.2. 19118 rancbmaligamuts 
wre generated, of which 1200 had coarse scores better than -30.0 (the negatiw of 
the FMEobjective function). Aong these 1200, one was essentially correct. The 
first, second, third, fourth, fifth, and fifteenth best scoring alignmnts are sbowiin 
Egure 10-10. 

Wh coarse - fine refinemnt, the best scoring rancbmalignmnt converged to 
the sam pose as the best refine mn t, fromtbe indexing experimnt, shown in Egure 
10-7, wth fine score -355.069. The next best scoring rancbmali gnmnt converged to 
a grossly vrong pose, wth fine score -149.064. This score provides som iideation 
of the noi se 1 ewl i n the fine IMIl.ol ;j ecti ve functi on i n pose space. 

This test, tho ugh not exhaustive, produced no false positives, in the sense of a bad 
ali gnmnt wth a coarse score better than that of the correct ali gnmnt. Adtionally, 



146 


CHAPTER 10. RECXXM PI CN EXPERIAENIS 


Inage Index 

Mdel Index 

Wgbl 

90 

86 

0.022738026840027032 

90 

101 

0.014615921646994348 

90 

102 

0.807966693444096 

90 

103 

0.09581539482455806 

91 

103 

0.9633441301926663 

92 

85 

0.24166197059125494 

92 

103 

0.19778274847425015 

93 

87 

0.02784697957543993 

93 

88 

0.37419218245379466 

94 

87 

0.7478667723520142 

95 

87 

0.44030413275215486 

96 

86 

0.6127902576993082 

97 

85 

0.9293665165549775 

98 

85 

0.8621763443868999 

99 

84 

0.9634827438267516 

100 

5 

0.6499527214931725 

100 

84 

0.19705210016850308 

101 

0 

0.011400725934573982 

101 

67 

0.9559675939354566 

102 

66 

0.9194110795990801 

102 

67 

0.0541643593533511 

103 

64 

0.04765362703894284 

103 

65 

0.8454128520499249 

103 

66 

0.05787873660955701 

104 

63 

0.05270908685541295 

104 

64 

0.8854088356653954 

104 

65 

0.014744194821866506 

105 

62 

0.06158503222464117 

105 

63 

0.9139939556525918 

106 

61 

0.09270729594689026 

106 

62 

0.8635739185353283 

106 

63 

0.010447389024937672 

107 

61 

0.9108899984969661 

107 

62 

0.021204649868405194 

108 

60 

0.861831671427887 

108 

61 

0.049220125250993084 

109 

58 

0.018077232316743887 

109 

59 

0.9257311183042934 

109 

60 

0.015434004217119308 


lElie 10.1: Som EMWgbls 
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Egure 10-10: RixbmAignnaiits 
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the fine score of the refinemnt of the most praising incorrect randoinal i gnnant 
was significantly lover than the fine score of the (correct) refined best alignment. 

10.3 Convergence wth Qcl nsi on 

The convergence behavior under occlusion of the IMdgorithnawthMEwas eval¬ 
uated usi ng the coarse features descri bed i n Secti on 10.1. Images features si nd ati ng 
varying amunts of occlusion were preparedly shifting a varying portion of theinage. 
These images, along wth results of coarse - fine refinem en t using the EMalgorithmi 
are showi i n E gure 10-11. 

The starting value for the pose was the correct (null) pose. The refinem en ts 
converged to good poses in all cases, demonstrating that the method can converge 
fronagCHxl alignments wth moderate amounts of occlusion 

The final fine score in the most occluded example is lover than the noise level 
observed in the experiment of Section 10.2. This indicates that as the amount of 
occlusion increases, a point wll be reached where the method wll fail to produce a 
good pose havi ng a score above the noi se 1 evel. In tli s experi n em t i t happens before 
the method fails to converge properly 

10.4 3DIfecqgri t i on Ekperi rail s 

1041 KfnrgSD Aiguils 

This section demonstrates use of the EMalgorithnewth IMI1.1 o refine alignments 
in 3Drecognition The linear conli nation of view method is used to accommodate 
a lililted amount of out of plane rotation Atwsviewvariant of K\( described in 
Secti on 5.5, is used 

A coarse - fine approach was used (barse I Mil, scores were computed by 
snootbi ng the IMIl.ol ;j ecti ve f uncti on, as descri bedi n Secti on 7.3.2. The snoothi ng 
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Egure 10-12: Gayscale Inage 


mtrix ms 

DAJ(7.07) 2 , (3.0) 2 ) . 

These numhers are the amounts of additional (artificial) variance added for piarallel 
and perpendi cul ar deviations, respectively, in the oriented stationary statistics mdel. 

The video test inage is showi in Egure 10-12. It differs fromthe mdel images 
by a significant 3Dtranslation aixl out of plane rotation The test inage edges are 
showi i n E gure 10-13. 

The object mdel ms derived fromthe tv© Man Rlge Images showi in Egure 
10-14. These vere constructed as descri lied i n Secti on 4.4. 

The smoothing used in preparation of the edge naps had 1.93 pixels standard 
devl ati on, and the edge curves vere Ixolen arli trari 1 y every 10 pi xel s. Ri nt- rad us 
featru’es vere fitted to the edge curve fragments, as descri lied in Section 5.3, for 
purposes of cl spi ay and for computi ng the ori ented stati onary stati sti cs, al though the 
features used wth IMH.and the EMalgoritlmiere simply the .Akxl 1 coord nates 
of the centroids of the curve fragments. B)th view of the mdel feat ires are showi 
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inEgure 10-15. The linear coda nation of view mt bod requires correspondences 
a mng the mdel view. These were established by band, and are cl spl ayecl i n E gure 
10-16. 

The rel ati onsli p amng the vi evpoi nts i n the mdel i nages aixl the test i rrage i s 
illustrated in Egure 10-17. This represents the region of the viewspbere containing 
the viewpoints. Mte that the test inage is not on the line joining the two mdel 
view. 

The oriented stationary statistics mdel of feat my fluctuations ms used (this is 
described in Section 3.3). A in Section 10.1, tire paramters (statistics) that appear in 
tire IMIl.ol ij ecti ve f uncti on, tire baclground probali 1 i ty and tire comri ance mtri x 
for tire oriented stationary statistics, were derived fronmatcbes done by band 

Aset of four correspondences ms estaliisired manually fromtlre inage features 
to tire object features. These correspondences are intended to simulate an al i gnron t 
generated ly an indexing system Indexing system that are sritalie for 3Drecqgui- 
ti on are descri bed ly (] e mir s and Jacobs [ 19] and Jacobs [ 45]. The ro ugh al i gnmn t 
and score were obtained front be correspondences ly one cycle of tire EMilgoritlnpi 
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Score: -10.8428 


Egure 10-19: (Sarse ftfined Aignnanl aixl (Sarse Score 



Egure 10-20: Ene Ilfined Aignnanl and Ene Score 
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figure 10-21: Ene Rfined Aignnant wth Vdeo Inage 
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as descri bed above i n Secti on 10.1.2. They are di spl ayed i n E gure 10-18, where the 
four corresponding features appear circled Acoarse alignnant was then obtained 
by running the EMalgorithmto convergence wth smothing, the result appears in 
Egure 10-19. This alignnant was refined further by running the EMdgor it hmagain, 
wthout smothi ng. The resul ti ng al i gnmnt and score are shown i n E gure 10- 20. In 
these fgures, the inage features are shown as curve fragments for clarity, although 
oil y the poi nt 1 ocati ons were used i n the experi ran t. The i rage features used are a 
subset takenfroma rectangular region of the larger inage. 

Egure 10-21 displays the final alignnant superinposed over the original video 
i rage. Afet of the mdel features have al i gned vel 1. The di screpaney i n the f orrard 
wheel veil ray be due to inaccuracies in the ECVmdeling process, perhaps in the 
feature correspondences. This figure de ran strates goodresults for aligning a smoth 
3Dobject havi ng si x degrees of freecbnrf nation, wthout the use privileged features. 

1042 Rftirg I&tirhed Bees 

This secti on describes an additional demonstration of local search i n pose space usi ng 

IMli.i n 3D 

The pose corresponding to the refined alignnant displa^d in Egure 10-20 was 
perturbed by add ng a d spl acerant by 4.0 pi xel s i n Y. Thi s pose ras then refined 
by running the EMalgorithmto convergence. The perturbed alignnant and the 
resulting coarse - fine refinenant is showi in Egure 10-22. The result is very close 
to the pose prior to perturbation 

Asinhlar experi rant was carried out wth a larger perturbation, 12.0 pixels in 
Y. The results of this appear in Egure 10-23. This tina the convergence is to 
a clearly wong alignnant. The mdel has been stretched to a thin configuration, 
and nhsnatched to the inage. The resulting fine score is lover than that of the 
good alignmnt in Egure 10-21. This illustrates a potential drawback of the linear 
conta nation of view nathod In add ti on to correct view, ECV can synthesize 
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E gure 10- 22: Ml d y Rrturbed Ai gnmnt aixl ftsul ti rg Mnemnt 
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Egure 10-23: Rrturbed Aignnant and Ksulting ftfinenant wthEne Score 
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view where the mdel is stretched I£\f as used here, has 8 paranaters, rather 
than the 6 of rigid nation The tv© extra paranaters deteriime the stretching part 
of the transformation This proliemcanbe addressed by checking, or enforcing, a 
quadratic constraint on the paranaters. This is discussed in [71]. 

Aether sinhlar experinant was performd starting wth a very bad aligmant. 
The results appear in Egure 10-24. The algorithmwas able to bring som features 
into alignnant, but the score remained low 
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Chapter 11 


Cbncl us i ons 


Vsual object recognition - finding a knovn object in scenes, -where the object is 
smoth, is viewed under varying illunhnation conditions, has six degrees of freedom 
of position, is subject to occlusions and appears against varying backgrounds - still 
presents problem. In this thesis, progress has been mde by applying natbods of 
statistical inference to recognition Eer-present uncertainties are accomodated 
by statistical characterizations of the recognition problem MP Mdel Mtcling 
i Mil and Bsterior Mrginal Rse Bti nation ( I Mil). MMwas showito be 
effective for searching arong feature correspondences and I MU . was shown effective 
for searches i n pose space. The i ssue of acqui ri ng sal i ent obj ect features under varyi ng 
i 11 unhnati on was addressed by usi ng Man Kjge Inages. 

The alignmnt approach, vfich 1 everages fast indexing nathods of hypothesis 
generation, is utilized. Agle Rir Indexing is introduced as aneffiient 2Dindexing 
mthod that does not depend on extended or special features that can be hard to 
detect. Ai extension to the alignmnt approach that my be sunmri zed as al ign 
refine verify is advocated. The EMalgorithmis enplo^dfor refining the estinate of 
the object’s pose while simultaneously identifying and incorporating the constraints 
of all supporting irage features. 

Aeas for future researchinclude the folloving: 
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• Indexing was rot usedintbe 3Dreccgnitionexperimiits. Identifying asuitalie 
mcbanisnfor this purpose that nashes veil wththe type of features usedhere, 
voul d be ail i nprove rom t. 

• To few view vere used i n rodel construction. Elly autorating the nadel 
acquisition process, as described in Chapter 4, and acqui ri ng rodel s fromrore 
view TOuldbelp. 

• Extending the fornnlations of rec qgn i ti on tohandl e multiple obj ects is straight¬ 
forward, but identifying sui tali e search strategies is an important and non 
trivial task 

• Ineorporati ng non 1 i near nadel s of proj ecti on i nto the fornd ati on w>ul d al 1 ow 
robust performance in cbnains baving serious perspective distortions. 

• IJing i nage-1 ihe tali es could speed the evaluation of the FMEobj ecti ve func¬ 
tion 

• Investi gating the use of I Mil, i n obj ect tracking or in other active vision do¬ 
mains light prove fruitful. 

Mre w>rk in these areas wl 1 lead to practical and robust object rec qgn i ti on 
system. 
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Not at i on 


Symbol 

Mani ng 

Defir, 

Y={Y u Y 2 ,...,Y n } 

the imge 

2.1 

n 

nunher of imge features 


Y eR v 

imge feature 

2.1 

AE{M . \l ) 

the obj ect mdel 

2.1 

m 

nunher of object features 


M 3 

mdel feature, frequently M j £R VXZ 2.1 

A 

the background feature 

2.1 

r={r 

correspondences 

2.1 

I\- GMJ{_L} assignmnt of imge feature i 

2.1 

l')€R ‘ 

pose of obj ect 

5.1 

Wi,a 

proj ecti on i nto i mge 

5.1 

Gy ( 2 ) 

Gcussi an probali li ty densi ty 

3.2 6.1 


cowi anee mtri x of feature pai r 

3.3 

i 

stationary feature covariance mtrix 

3.3 

Ay 

cowi anee mtri x of pose pri or 

6.1 

$ B i 

background probali 1 i ty 

2.2 2.4 

w k 

extent of imge feature dimansion k 

3.1 

Aj y A 

correspondence rernrd 

6.1 

X 

estimte of x 


A-) 

probali 1 i ty 

(see 


below) 


Kobalility rotation is sonawbat abused in this w>rk, in the interest of brevity 
my stand for either a probability mss function of a discrete variable 2 ; or fora 
probali 1 i ty densi ty f uncti on of a conti nuous vari ali e x The mani ng wl 1 be cl ear i n 
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context based on the type of the wi ali e argunant. AH ti onal 1 y, nirxed probali 1 i ti es 
are described wth the sana notation Ear exanple j(T, f3\ Y) stands for the nirxed 
probability function that is a probability nass function of T (the discrete 'variable 
describing correspondences), and a probability density function of /3(the pose vector) 

- both condi ti oned on Y (the i nage feature coord nates). 
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